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Preface 


Tests and measurements — their techniques and devices — are 
valuable aids to the kind of pupil evaluation that encourages 
optimum growth. However, there are teachers who either 
have blind faith in the efficacy of tests or are unreasonably 
skeptical concerning their use. Our aim is to help teachers use 
tests appropriately and constructively. 

This book is an outgrowth of the authors’ experiences in 
conducting various courses in “tests and measurements.” In 
these courses we have found it advisable to cover less mate- 
rial with more illustrative examples in order to achieve good 
communication. Even experienced teachers, whom both 
authors have met in extension classes, are frequently appre- 
hensive of the statistics and technicalities involved. Students 
preparing to be teachers also frequently doubt their ability to 
comprehend test usage. We have dared to presume that 
teachers and teacher candidates in other states are much like 
those encountered in Oregon: (1) They need to have the 
subject of testing presented with a minimum of statistics. (2) 
They need to know the limitations of tests. (3) They need to 
perceive the substantial aid that appropriately used tests can 
give. (4) They -should have these “needs” met in an effective 
manner. Our aim, then, is to present the basic features of 
tests and testing in terms understandable to classroom 
teachers. 

Brevity has been one of our guideposts, for the sheer bulk 
of some books on measurement intimidates the teachers who 
enroll in our classes. The desire to make our treatment brief 
has sometimes caused us trouble in the writing of this book — 
we should have liked to go into more detail in explaining 
contributing and conditioning factors in various situations. In 
the first draft the chapters were longer and more cautiously 
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detailed, but one or the' other author pared and whittled until 
both agreed that the minimum for effective communication 
had been reached. We hope that users of the book who 
might have written it differently will bear in mind the desir- 
ability of brevity and simplicity in the presentation of basic 
material. 

Another of our guideposts has been the intent to direct the 
material to classroom applications. The question, “Is this 
section (or paragraph) pertinent to the kinds of work 
teachers do?” was asked repeatedly while the chapters were 
being written and in each author’s evaluation of the other’s 
chapters. We feel that we have come close to fulfilling our 
criterion of classroom pertinence. 

Some readers will not be able to share wholeheartedly our 
criticism of grades and personality tests. The senior author 
has prevented the expression of skepticism regarding these 
techniques from being even more emphatic. Our hope is that 
the presentation will stimulate thinking — as did the prepara- 
tion of the material. Instructors who encourage their students 
to discuss the points of view presented will find that students’ 
evaluations will elicit many of the concepts of evaluation and 
bring forth a recognition of the merits and shortcomings of 
testing devices. 

Our third guidepost was to keep the book in such form as 
to provide for flexible use. Brevity makes it possible for in- 
structors to develop their own points of emphasis. For those 
who plan further study, we have selected and annotated 
readings and provided study and discussion items. 

We wish to express our gratitude to the publishers of books 
and tests who have given us permission to use copyrighted 
materials. We also thank Mrs. Alta Diment, who typed and 
edited the manuscript from what was in many instances very 
rough copy. 

Denis Baron 

Harold W. Bernard 
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CHAPTER ONE 


Sampling Pupil Behavior 


In earlier times costly tunnels were driven into mountains 
to find out whether quartz outcroppings indicated the pres- 
ence of ore beneath the surface. Often the effort was fruitless 
— discouraging, time-consuming, and financially ruinous. 
Today it is possible to save time and money by using diamond 
drills, which, driven from various locations and at different 
angles, bring up cores, or samples, of the mountain’s interior. 
The extent of the ore body can be determined fairly accurately 
from these cores. Promising cores are sent to assayers, who 
determine by tests the amounts of lead, silver, and copper 
contained in the samples. The results of these tests — the cores 
and the assayer’s chemical analysis of them — make it possible 
to estimate the value of the ore that will be obtained if the 
expensive tunnel or shaft is driven. 

There are tools available to teachers today which, like the 
diamond drill, save time, energy, and frustration in working 
with pupils. These tools arc tests — samples of the behavior 
and traits of individual pupils that indicate quickly and with 
reasonable accuracy their status at a particular time and their 
potential. Tests provide samples of intelligence, knowledge, 
social orientation, and special aptitudes. They enable the 
1 
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teacher to get a clearer view of the “inner workings” of his 
pupils. 

The miner takes a chance when he drives a tunnel; the pres- 
ence of ore does not ensure his success. Faulty earth may 
cause cave-ins, underground water may be expensive to com- 
bat, a slump in the stock market may create a poor sales 
field, and technological advances may reduce the demand for 
his ore. So, too, although the samples that the teacher obtains 
through tests may promise well, serious illness, a broken 
home a shift in values (as during a war), or a quarrel with a 
cherished fnend may threaten the realization of what is indi- 
cated by the tests. 

J h ?:r n / h r’ a .“ not panaceas '' they do not solve all the 
useful deviL e s Ca l‘ 0nand ClaSSr °° m ma " a g e ™">- They are 
and erro UP ” PUpi ' S in a better < ba " tab 
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and what they do must make the diagnoses. Tests provide 
only data upon which to base a diagnosis. Educational and 
psychological tests do not make analyses or suggest what 
should be done; they give indications which may serve to 
sharpen and clarify the judgments teachers make on the 
basis of their experience, training, and understanding. 

Attitudes toward Tests 

As samples of behavior which yield indications, tests de- 
pend for their usefulness upon the attitude and knowledge 
of teachers. This entire book is directed toward the develop- 
ment of this essential background. Before we proceed, how- 
ever, let us examine some characteristic attitudes toward tests. 

1. Some persons have blind confidence in tests. They ap- 
pear to feel that all one has to do to solve an educational 
problem is to give a test, record results, and file the data. 
This attitude toward test results is, of course, absurd, because 
there are no simple answers in relation to complicated person- 
alities. Too many teachers give tests in order to discover that 
Johnny has an IQ of 90, is up to age-grade standards in most 
of his schoolwork, and is “average” in terms of personal and 
social adjustment. The results are recorded in his cumulative 
folder, and doss work goes on in the same perfunctory man- 
ner as before. 

2. Tests are often regarded with an element of fear. Undue 
emphasis may have been placed on test results in the teacher’s 
school experiences, and he may fear that tests will produce 
similar anxiety in pupils. This attitude is unfortunate. It is 
not the tests which should be feared but the misuse of test 
data. If one is to be failed because of his test score, he has 
reason to be apprehensive. However, if tests are used to pro- 
mote understanding and diagnosis, they will be welcomed. 
Actually, most of us like to take tests — if the results are not 
to be used against us. Many people enjoy the tests in Time, 
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Look, and the Reader’s Digest because nothing but self-eval- 
uation is based on the results. It is entirely possible that chil- 
dren could learn to enjoy educational tests if the element of 
threat were removed and if the results were useful to pupils 
and teachers in their efforts to achieve better understanding. 

3. Tests are regarded by some with tentative confidence. 
This attitude approaches the sound and realistic view ex- 
pressed by the person who says, ‘Til take the tests for what 
they are worth and permit myself to be guided by the results.” 

’ owever > the hesitating confidence is expressed as “I’ll use 
the results as long as they agree with views I already hold,” 
the test results can serve little constructive purpose. 
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representative questions which survey only part of the total 
area of knowledge and interest. 

Measurement and Evaluation 

Words often make understanding possible, but they some- 
times cloud the issue. Unless the specific connotation of a 
word has been learned, prior interpretations may get in the 
way of understanding. Thus, measurement in education has 
a slightly but significantly different meaning from the same 
word applied to carpentry, a purchase of sugar, or a bank 
account. These types of measurement can be accurate — re- 
peated measurements would yield identical results — but meas- 
urement in education cannot be repeated with identical out- 
comes. It might clarify our thinking if the word evaluation 
were substituted for measurement. But since the word meas- 
urement appears in educational literature and in discussions, 
it is advisable to indicate its specific connotation. Measure- 
ment by testing may be considered as a means by which eval- 
uation is achieved. Evaluations are often made without the 
basic data supplied by measurement, but sound evaluation is 
based upon the results of measurement. 

Considerable measurement is involved in purchasing a 
home. The amount of floor space; the cost of brickwork, lum- 
ber, and wiring; and the size of the lot are among the measure- 
ments to be considered. The judgments based on these meas- 
urements are an evaluation. Further, some intangible items 
would enter the picture; the style of the house, convenience 
of room arrangement in terms of family needs, and com- 
munity environs would be taken into account although they 
are beyond the limits of precise measurement. 

Similarly, some significant educational factors are beyond 
the limits of measurement by tests. For example, tests do not 
measure drive or motivation to use the knowledge or intelli- 
gence indicated by tests, nor do they measure the view that 
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the pupil takes of learning or the appeal or effectiveness of 
teaching. Such data as anecdotal records, the teachers eva u- 
ation of his pupils, health data, and recorded observations of 
play and social behavior should be used to supplement and 
validate test data. Both teachers and pupils must realize that, 
as in buying a home, measurement is at best a basis for eval- 
uation. By means of it one can arrive at a more accurate 
evaluation than could be achieved by trial and error or by 
personal opinion. 

By providing the measurements upon which we can base 
evaluations, educational tests can do much to help improve 
the effectiveness of instruction and guidance. (1) Tests can 
help to estimate the present potential of the pupil to learn. 
(2) They can give fairly accurate information regarding the 
pupil’s academic knowledge. (3) They can show about how 
much a pupil has grown in a given period of time and thus 
help to evaluate the efficiency of methods of teaching. (4) 
Tests can help to locate specific areas of difficulty (though 
they do not tell what should be done about the difficulty). 
(5) Properly used, tests can be a factor in the motivation 
of pupils. (6) Test results can give guidance in the more 
equitable grouping of pupils for the purpose of economy in 
instruction. (7) Tests can provide clues to intelligent guid- 
ance of pupils in their academic choices and their personal 
adjustment. (81 Tests can provide supplementary data lead- 
ing to a more objective evaluation of pupil status and prog- 
ress. 

It should be noted, however, that tests do not completely 
accomplish any of these things. They only help by providing 
dues and corroborative data. 


Til PES or TESTS AND MEASUREMENTS 

Tests differ widely with regard to their nature and the pur- 
^ t c> are designed to serve. According to the way they 
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are designed and used, tests may be classed as verbal or non- 
verbal, performance or pencil-and-paper, and group or indi- 
vidual. A verbal test is one in which language plays a major 
part. The ability of the pupil to speak, read, and write de- 
termines in a major degree his effectiveness on this kind of 
test. His ability to repeat statements and his ability to follow 
written or spoken directions are sampled by verbal tests. Non- 
verbal tests indicate the pupil’s ability to see the similarity or 
dissimilarity between pictorial materials or geometric figures, 
follow mazes, or put parts of a puzzle together. Speed of 
manipulation, accuracy of movement, and sharpness of per- 
ception are sampled by this type of test, and the use of lan- 
guage is minimized but not eliminated. In a performance test, 
the subject may be asked to maneuver blocks into a pictured 
design, place the parts of a picture-board puzzle together, or 
repeat a series of digits given to him orally. In a pencil-and- 
paper test, the subject records his own answers. He checks the 
answers he selects, draws his way through a maze, or com- 
putes the answer to an arithmetic problem. Quite often, al- 
though not always, performance tests are nonverbal and pen- 
cil-and-paper tests are largely verbal. A group test is simply 
a test which a number of pupils take simultaneously. An in- 
dividual test is one which requires one examiner for each ex- 
aminee. 

Tests may also be classified as to their purposes. Some 
are designed to sample aptitudes, others achievement, and still 
others specific difficulties. The most common aptitude test is 
the intelligence test, which is designed to indicate the pupil’s 
capacity to learn. Musical-ability tests and reading-readiness 
tests are other commonly used aptitude tests. Achievement 
tests indicate the pupil’s level of performance in specific aca- 
demic areas such as reading, spelling, arithmetic, language 
usage, and comprehension of vocabulary. Tests designed to 
indicate specific areas of difficulty are called diagnostic tests. 
A diagnostic test in arithmetic may serve to indicate whether 
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the pupil’s specific difficulty is in multiplying, adding, or di- 
viding or whether there is some particular number combina- 
tion that he has learned incorrectly and uses consistently, 
e.g., 6 X 9 is 52. 


Scales and Inventories 

Many of the instruments that are helpful for more objective 
pupil evaluation are called by other names than “tests.” The 
quality of a sixth grader’s handwriting is difficult to evaluate, 
but a handwriting scale on which there are graded examples 
of writing may serve to objectify the judgment. Spelling 
S p~ eS ^ rou P t0 S c ther words that have a similar degree of 
difficulty and familiarity and arrange them in order of increas- 
g difficulty. There are rating scales which are used to re- 
ord interpersona 1 evaluations. (See Chapter 11.) For exam- 
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a clue to the person’s total adjustment in home life, school 
life, interests, and views regarding what is correct and incor- 
rect in daily behavior. 

The questionnaire technique is often used to study atti- 
tudes and interests. The subject is asked to indicate whether 
or not he agrees with a number of statements regarding cer- 
tain situations — for example, the activities and requirements 
of school. Again, no one question is regarded as giving an in- 
contestable clue to the respondent’s orientation, but the total 
score is regarded as indicative of trends in values and atti- 
tudes. 


Projective Techniques 

Projective techniques are coming to be regarded as val- 
uable clues to personality. The projective technique permits 
the subject to “add structure to an unstructured situation”; 
that is, the individual injects his own meaning — theoretically, 
his own personality — into the situation. Specifically, a child 
may be given a set of dolls and doll furniture and told to do 
anything he wants to with them. What he does and how he 
treats the playthings is considered to be a reflection of his own 
personality. A picture with ambiguous content may be shown 
to the subject. What he describes as the content of the picture 
is, in part at least, a product of his own imagination. He may 
be told part of a story and asked to finish it; the ending he 
provides is considered a reflection of his inner feelings, past 
experiences, and wishes. 

Obviously, it is hazardous to interpret these “projections” 
of the pupil’s personality too precisely. But if used cautiously 
and interpreted in the light of more objective supplementary 
data, including interviews and observations, projective tech- 
niques may provide valuable clues in understanding children. 
Projective techniques illustrate the fact that a distinction must 
be made between measurement and evaluation. Results from 
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these devices must be interpreted or evaluated; hence it is 
necessary that the teacher know something of the nature and 
dynamics of personality. Projective techniques include such 
activities as interpreting pictures, completing partially told 
stories or incomplete sentences, playing with toys, describing 
what one sees in ink blots or pictured cloud formations, 
painting, drawing, modeling clay, and writing stories and 
poems. 


STANDARDIZATION 

It is important that teachers understand the significance 
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test lead the teacher to repudiate teacher-made tests, which 
have a place in a balanced program of evaluation. Teacher- 
made tests can be made to fit the local situation better; they 
can more pointedly refer to what has been taking place in a 
particular classroom; and they are economical devices for 
practice and drill. 


SUMMARY 

Tests are instruments for “sizing up” pupils. They yield 
clues, or “measures,” based on sampling techniques, which 
can be used (when properly interpreted) to evaluate the moti- 
vation, ability, and growth of individual pupils. As yet there is 
no single test which provides enough clues to evaluate the 
“whole child.” Hence, a balanced program for sizing up pupils 
will include a variety of tests which differ in mode of con- 
struction and in purpose. Used as a means to better eval- 
uation, tests can save the time and energy of teachers and 
pupils, just as samples taken by diamond drills save the time 
and energy of the miner. 


STUDY AND DISCUSSION EXERCISES 

1. How does the growth principle which states that “growth is 
a product of the interaction of the organism with its environment” 
bear upon the measurement concept in education? 

2. What could the classroom teacher do to make test taking a 
less emotionally upsetting experience? 

3. In your own words, distinguish between evaluation, or ap- 
praisal, and measurement. 

4. List a number of things that tests do not do. 

5. How do you think a classroom teacher might most advan- 
tageously use a personality inventory? 

6. In what ways might a projective technique be superior to a 
personality inventory? 
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SUGGESTED ADDITIONAL READINGS 


Cook, Walter W.: “Achievement Tests,” in Walter S. Monroe 
(ed.), Encyclopedia of Educational Research, rev. ed., New York: 
The Macmillan Company, 1950, pp. 1461-1478. 

An excellent presentation of the uses and limitations of achieve- 
ment tests, concepts used in evaluation, and problems involved 
in test construction. The extensive bibliography is especially 
commendable. 


Durost Walter N.: “Tests and the Junior High School Guidance 
Qranselor Test Service Notebook, no. 2, Yonkers, N.Y.: 
World Book Company, Division of Test Research and Service, 
this and other pamphlets may be obtained on request. This 
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Evaluation in Modern Education , New York: American Book 
Company, 1956, pp. 3-59. 

The meaning of evaluation and measurement, the historical 
background of evaluation, recent trends, steps in evaluation, 
types of evaluative devices, and characteristics of good test in- 
struments are discussed. 



CHAPTER TWO 


How to Identify a “Good” Test 


As we have seen, tests are used to evaluate certain personality 

° r increments of g row th. Some provide better 
bases tor evaluations than others; hence tests themselves have 
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HOW TO IDENTIFY A “GOOD” TEST 
cooperative is also bright, teachers are often surprised when 
a quiet or sullen lad achieves a high score on a test or when 
a blond, curly-headed little girl in a starched pinafore makes 
a low score. The element of subjectivity often leads to mis- 
taken evaluations. 

If a test is to increase objectivity, however, it must itself 
be objective; in it, the exercise of the teacher’s judgment must 
be reduced to a minimum. The answers to questions must be 
easy to interpret as either right or wrong and must leave little 
or no occasion for the teacher to say, “I’m sure he had the 
right idea in mind; I’ll give him credit.” Objectivity is in- 
creased by the use of short-answer test questions such as 
simple-recall (in which one word will correctly answer the 
question), true-false, multiple-choice, and matching ques- 
tions. If the material of these short-answer questions were dealt 
with in a hundred-word composition, the possibility of objec- 
tivity in scoring would be decreased; that is, different scorers 
would assign different values to the answer. However, even in 
the essay answer it has been found that objectivity is increased 
through the use of pre-formed model answers. Some intelli- 
gence tests provide such model answers, assigning varying 
weights, or values, to correct but differing answers. For ex- 
ample, some items on individual intelligence tests may be 
answered in several ways, but the sample answers provided 
in the manual give the test scorer some basis for scoring re- 
sponses. By and large, the classroom teacher achieves objec- 
tivity by using the scoring key provided with the tests and by 
carefully adhering to printed directions for administration, 
scoring, and interpretation. 

The Criterion of Validity 

Since tests are designed to give an accurate picture of some 
aspect of the personality of the pupil, it is important that they 
actually measure what they are designed to measure. This 
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characteristic of tests is called validity. A test is valid when 
the results it obtains correspond closely with those obtained 
by means of other criteria evaluating the same trait. 

The meaning of validity may be made clear by describing 
an invalid “arithmetic” test which consisted of twenty items, 
eac accompanied by a lengthy discussion intended to clarify 
he problem. Many of the pupils did ten of the twenty prob- 
^ Ut to finish the entire test in the time 
o e . Study of the results indicated that the pupils’ arith- 
™ COIre H>onded very closely to their scores on read- 
as much was suspected that the test actually tested reading 
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cessful in schoolwork, to have achieved enviable occupational 
status, earned superior incomes, established stable marriages, 
developed a variety of constructive avocational interests, and 
in many ways to be well on the road to eminence as partici- 
pating citizens and personally effective individuals. Hence the 
tests are considered to be valid from the standpoint of the 
“test of living.” 

It is advisable at this point to explain the concept of the 
coefficient of correlation, which makes it easier to compre- 
hend some of the characteristics of a good test. A coefficient 
of correlation (or of validity, or reliability) is a number which 
indicates the dependability of the predictions that are made in 
terms of that number. Wind direction can always be inferred 
from the direction of the smoke drift; the correlation between 
the two is 1.00, or perfect. The volume of a gas is inversely 
proportional to the pressure exerted upon it, temperature re- 
maining constant; the coefficient of correlation is — 1.00. Re- 
lationships among human traits lie somewhere between these 
two extremes. The correlation between two valid and relia- 
ble measures of intelligence would probably be about .90. The 
correlation between intelligence and reading would be some- 
where in the vicinity of .50 or .60 (read this “point five zero” 
or “point six zero”). The correlation between size and intelli- 
gence would be positive but so low as to make predictions 
for individuals extremely dubious — about .10 to .20. These 
figures are not percentages. They are figures which indicate 
how much credence can be placed in predictions based on 
that particular coefficient. 

Table 1 should be read somewhat as follows: A coefficient 
of correlation of .90 increases forecasting efficiency by 56 per 
cent over pure guess and provides 78 chances out of 100 of 
predicting correctly from one measure the approximate level 
of performance on another measure. There are 22 chances out 
of 100 that the prediction will be incorrect. Even a coefficient 
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Table 1* 


Correlation 

coefficient 



Chances in 100 of 


predicting at-or- 

Percentage increase 

above, and below 

in predictive 

average in future 

efficiency 

behavior 

0.0 

50-50 

0.5 

50.25-49.75 

2.0 

51—49 

5.0 

52.5-47.5 

8.0 

54—46 

13.0 

56.5-43.5 

20.0 

60-40 

29.0 

64.5-35.5 

40.0 

70-30 

56.0 

78-22 

69.0 

84.5-15.5 

80.0 

90-10 


0.00 

o.io 

0.20 

0.30 

0.40 

0.50 

0.60 

0.70 

0.80 

0.90 

0.95 

0.98 


Guidance Methods ol'lndMdJt^ ?' Darley ’ S,ud y‘"S Students— 
Associates, Inc., 1952, p 54 nQ lysis, Chicago: Science Research 

of correlation of .99 is onlv Qs 

(The chances in 1 nn f y per cent better than chance, 
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Henry e, Garrett c, . 

0,t ^nsm a „,. GTOn Education. New 
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.40 to .70 denotes substantial or marked relationship; 
.70 to 1.00 denotes high to very high relationship. 

The coefficient of correlation may be used to give a numer- 
ical indication of the validity of a test. If the test were per- 
fectly valid, the coefficient of validity would be expressed by 
the number 1.00. Each pupil would have the same rank in 
schoolwork as on the test, if rank in schoolwork were the cri- 
terion by which the validity of the test was judged. In other 
words, with perfect validity ( 1 .00 ) , an individual’s rank would 
be the same on the test as in a ranking by experts or by school 
marks. Actually, this does not happen in practice, but the 
amount of shift in relative position is relatively small in a 
highly valid test. This is illustrated in the following tabula- 
tions: 


Subject 

Rank on test 

Rank in schoolwork 

A 

1 

1 

B 

2 

3 

C 

3 

2 

D 

4 

5 

E 

5 

4 


According to some methods of computing correlations, the 
coefficient in the above illustration (between the test and the 
criterion of rank in schoolwork) is .80, a fairly typical co- 
efficient for mental tests. If the rank in schoolwork of pupils 
B and C and of pupils D and E were reversed, there would be 
perfect correlation between test results and rank in school- 
.work. The closer the coefficient of validity is to plus 1. 00, i.e., 
the closer the test is to the criterion, the better it measures 
what it purports to measure. In selecting tests the teacher 
should consult published reviews and catalogues of the test 
to learn the basis on which the validity of the test was estab- 
lished (the criterion of success) as well as the validity coef- 
ficient claimed for the test. Typically, validity coefficients are 
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lower than reliability coefficients, so differences between the 
two need not make the selector ot tests apprehensive. Al- 
though the coefficient of validity should be as high as possi- 
ble, it need not be as high as .70 or .80. Lee J. Cronbach 
has cited tests used in making military classifications which 
had coefficients as low as .45 but which were useful in pre- 
dtctrag performance in specified military activities ! 


The Criterion of Reliability 
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The coefficient in this case is called the coefficient of relia- 
bility. For a given test, this index number will be found in 
the manual and in published reviews. If the publisher of the 
test does not indicate the coefficient of reliability, it is prob- 
ably so low that advertising it would do little good. Hence, 
teachers would do well to use tests for which the indicated re- 
liability is somewhere between .80 and 1.00, remembering, 
of course, that it will never actually be 1 .00.* 

Each of two or three different tests of intelligence may be 
reliable although their results do not concur. Thus, three tests 
with reliabilities of .82 or better were given to one subject 
with the following results: test A — IQ 92, test B — IQ 115, 
test C — IQ 123. However, on equivalent forms of each of the 
three tests, the scores of the same individual varied no more 
than seven IQ points. This observation is made to impress 
upon teachers that it is important to indicate what test is be- 
ing referred to when an IQ is reported. It is also important 
to know that scores on a test in which there is a large non- 
verbal factor (test C above) will often vary markedly from 
scores on a test in which language facility is an important fac- 
tor. In short, a test that is reliable in itself may seem unre- 
liable when it is compared with another type of test. 

The Criterion of Comparability 
Another desirable characteristic of tests is comparability. 
If the teacher is to gain anything more from the test than 
a knowledge of the status of a pupil at the time he took the 
test, which is helpful but limited information, he must use 
tests which are comparable. This will make it possible to see 
how much the pupil has grown in a given period of time. 


•Leona E Tyler cites a table which shows how specified percentile 
scores may be interpreted in terms of given coefficients or reliability. 
See Leona E. Tyler, The Work ot the Counselor. New York: Apple- 
ton-Ccntury -Crofts, Inc., 1953, p. > 17 - 
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That is, a test is given at the beginning of a unit of work or 
semester and a comparable (equivalent) test is administered 
at the end of the period. The difference between the scores 
idicates how much the. niinil Loo nrnnrti 


a semester i 


a e end of the period. The difference between the scores 
indicates how much the pupil has grown during the period. 

the tests are not comparable, they present a distorted view 
o the pupil. For example, a reading test with supposedly com- 
parable forms was administered by two teachers. One gave 
orm A first and the other gave form B first, and each gave 
a ternate form at the end of an intensive reading-instruc- 
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“practice effect.” Some test manuals instruct the teacher to 
subtract a given number of points from the second test to 
correct for practice effect and hence obtain comparable re- 
sults. Regardless of whether form D, C, B, or A is given first, 
practice effect should probably be considered in interpreting 
the score on the second test. 

The Criterion of “Sampling" Adequacy 

If a test is to reveal how much a pupil knows, it must sam- 
ple adequately, that is, it must contain enough questions to be 
truly representative. Let us assume that two pupils are being 
tested in geography. One of them knows one of the ten items 
of information on the test and the other knows nine of the 
ten. A test of two items is given. If it includes the one item 
the first pupil knows, he will get a score of 50 per cent; and 
if it happens to include the one item the second pupil does 
not know, he, too, will get 50 per cent. Yet one knows nine 
times as much as the other. Warped views of test subjects can 
be avoided by adequate sampling, i.e., by including enough 
items so that the gambling chance is minimized. Adequacy of 
sampling is achieved by including enough items so that addi- 
tional items no longer seem to influence the score of the indi- 
vidual. At the same time, tests usually include only enough 
items to minimize the chance factor, for additional items do 
not seem to add to the accuracy of the test. Some tests, how- 
ever, have a limited usefulness in spite of inadequate sam- 
pling. One intelligence test, for example, consists of fifteen 
items. In the hands of a clinical psychologist it possesses the 
advantages of a rough but rapid screening device. But persons 
who do not fully appreciate the handicap of limited sampling 
might easily form an erroneous opinion of the test subject on 
the basis of this test. 

Some standardized tests have what are known as long and 
short forms. The longer form is used when there is plenty of 
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time for administration. However, experimental administra- 
tions have indicated that the short form samples widely 
enough so that there is little difference in its reliability as com- 
pared with the longer form. Using fewer questions than are 
provided in the short form leads to such variability in results 
that further reduction in the number of items is considered 
to be inadvisable. Using more items than the long form in- 
cludes does not give more consistent results; rather, the addi- 
tional items are subject to the law of diminishing returns. 

The Cost Factor 

The use of tests in obtaining a clearer view of the abilities 
of pupils is sometimes limited by cost factors. School-board 
members and administrators often feel that the cost of tests 
is prohibitive. It is therefore desirable to get the least expen- 
sive tests available for the advantages derived. Fortunately, 
the most costly tests are not always the most dependable from 
the point of view of adequacy, reliability, or validity. Many 
tests that are relatively inexpensive on a per-pupil basis give 
quite valuable clues to understanding pupils. Expense is fur- 
ther reduced by the provision of scoring sheets. Thus, the 
same test booklet can be used over and over by inserting new 
answer sheets. This makes it possible to reduce the financial 
outlay to a penny or two per pupil once the booklets have 
been purchased. 

It is hard to generalize about monetary costs. As the 
teacher examines tests, he should study the various test 
catalogues to sec how much a package of twenty-five tests will 
cost, whether the manual of directions is included in the price, 
and whether or not there are separate answer sheets available. 
The price tag is not a highly dependable criterion, because 
one must consider that the objective is to get the best test for 
the price paid. 
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Nor is the question of economy limited to a consideration 
of financial outlay alone. Good tests should be economical of 
the teacher’s time — that is, the results should be easily scor- 
able. Preferably, the teacher should be able to score a test 
simply by counting the correct or incorrect responses. Little 
computation should be required, for computation not only 
takes time but provides a chance for error to creep in and 
thus reduces the reliability of the test. Economy of time should 
also be considered in giving the test. Specifically, the direc- 
tions should be easy to understand and easy to explain clearly 
to the pupils. Examination of the manuals of directions of 
two different tests of the same knowledge or ability will in- 
dicate that they differ widely with regard to the ease with 
which they can be understood and administered. 

The Test Manual 

In order to be of maximum usefulness to the teacher, the 
test should include a manual of directions. This manual 
should do the following specific things: (1) It should explain 
the specific advantages, features, and purposes of the test. 
(2) It should explain the process of its standardization so 
that the teacher will know how much confidence can be 
placed in the results obtained. (3) It should give clear and 
concise directions for administering the test. The total time 
to be allowed and the time allotment for individual parts 
should be clearly indicated. (4) Even if the scoring pro- 
cedures seem perfectly obvious, they should be described in 
detail. (5) Considerable space should be devoted to an in- 
terpretation of the scores. The meaning of specific scores in 
terms of average grade placement, quality of work, or com- 
parative standing in a group should be indicated. (6) Sugges- 
tions should be made for the intelligent use of the results in 
pupil guidance. 
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SUMMARY 


Tests differ widely in the care with which they are con- 
structed, and they are not of equal value in achieving an ob- 
jective view of the pupils. The relative worth of tests can be 
judged in terms of their objectivity, validity, and reliability. 
The better tests are economical of time, effort, and money; 
they sample widely; and the manuals that accompany them 
are sufficiently detailed to indicate exactly how the test should 
be viewed, administered, scored, and interpreted. Tests that 
meet these criteria will be of real help to the teacher in estab- 
lishing realistic but growth-inducing goals for pupils. 


STUDY AND DISCUSSION EXERCISES 

1. Does the desirability of objectivity imply that the teacher’s 
answer ntS ^ 00 plaCC in evaluation an d appraisal? Explain your 
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This chapter, “The Criteria of a Good Examination,” deals 
with validity, reliability, adequacy, objectivity, administrability, 
scorability, comparability, economy, and utility of tests. 

Ross, C. C.: Measurement in Today's Schools, 3d ed. (revised by 
J. C. Stanley), Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1954, 
pp. 106-135. 

This chapter deals with the characteristics of a satisfactory 
measuring instrument. In addition to defining the terminology 
used in this discussion, the author makes suggestions on the uses 
and limitations of tests. 

Roulon, Phillip J.: “Validity of Educational Tests,” Test Service 
Notebook, no. 3, Yonkers, N.Y.: World Book Company, Division 
of Test Research and Service, 1947, 4 pp. 

This free leaflet discusses validity in terms of the objectives of 
education and the criterion of behavior and describes how tests 
are designed to obtain greater validity. Relative merits of short- 
answer and essay tests are discussed. 

Trow, W. C.: Educational Psychology, 2d ed ., Boston: Houghton 
Mifflin Company, 1950, pp. 326-365. 

A discussion of the terminology used in testing procedures. 
Reliability, sampling, correlation, validity, objectivity, and types 
of questions are treated. Suggestions for summarizing test re- 
sults are given. 

Wrightstone, J. Wayne, Josiah Justman, and Irving Robbins: 
Evaluation in Modern Education, New York: American Book 
Company, 1956, pp. 16—28. 

In addition to the objectives for a testing program, this chap- 
ter deals with the relation of evaluation techniques to the 
curriculum. 



CHAPTER THREE 


Choosing the Right Test 


If there were one best test for evaluating any one pupil trait 
or ability, there would be no problem involved in test selec- 
tion. The one best test would have proven itself, and custom 
and common practice would make it widely known. However, 
this happy ’ situation would necessarily impose some limita- 
tions. For example, the test would have to be widely appli- 
cable; it would not be adaptable to the problems of a particular 
sc ool system. Users would be obligated to pay whatever 
pnce was charged for it. The publisher of the test could prob- 
ably afford to be slow in rendering service to users and might 
r C ? i t0 n ? a ^ e an y changes in a test so widely ac- 

al . C ” ecessit y for test selection carries with it some 
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Capitalizing on Group Wisdom 

Since there is an opportunity involved in the task of test 
selection, several teachers should be given the privilege of 
participating. It is advantageous to bolster the wisdom of one 
or two individuals with the suggestions and advice of others, 
and pooling this knowledge will also help teachers to realize 
the values and shortcomings of tests as well as the problems 
involved in using them. 

It has proved advantageous for the teachers in a school to 
select jointly the tests that will be most profitable for them 
locally. This is sometimes done in the larger systems by having 
a committee formulate the objectives of the testing program 
and choose the tests in the light of the purposes they are to 
serve. In smaller systems all teachers might well be involved 
in the preliminary discussion of purposes and problems. The 
group approach is especially advantageous when different 
forms of the same tests can be used in several consecutive 
grades, thus gaining the advantage of the criterion of com- 
parability discussed in Chapter 2. The sixth-grade teacher, 
for example, can then compare the score of a pupil in his 
grade (on an alternate form of the test selected) with the 
score obtained by that pupil on another form of the same test 
when he was in the fifth or the fourth grade. If different tests 
are used, although both are valid and reliable, the results may 
not be comparable, and if the scores are unwittingly compared, 
the comparison will be misleading. Thus teachers administer- 
ing both the Metropolitan and Iowa achievement tests have 
found that the class average is as much as half to a full grade 
higher on one test than on the other. The scores of individual 
pupils vary as much as a full grade on the two tests. Both 
tests are accurate, reliable, and valid; they were simply not 
standardized simultaneously for purposes of comparison. 

The group approach also has the advantage of capitalizing 
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upon the experience of various teachers with different tests. 
Pupil reactions and typical problems in administration can be 
anticipated, many of which may not be described in the test 
manual. 


DETERMINING THE PURPOSES OF THE 
TESTING PROGRAM 

Whether tests are selected by each teacher individually for 
his own class or by a group of teachers for a number of grades, 
the first step is to determine objectives— to decide what the 
testing program is designed to do. Let us assume that the list 
would include some or all of the following objectives: 

T °dMdua"s S1ShtS im ° S ° C ' aI faC ' lity ° f the pUpi,S as in ' 
^general inf0rmati0n about their “P^des for learning in 

T ° trenphs nf0mia,i0n ab ° U ‘ their SpeCifc ,alents or reIa,ive 
T °mtt OVer ' heir PreSem Sta ‘ US “ object-matter achieve- 

To ^motivate ‘° T * specific weaknesses 

effort ^ PU ‘ f ° rth «“!*»» end serious 

This list of objectives suggests that u, , 

™t as ends in themselves bm as W *° be used 

learning and instruction. They are a " npr ° Vemen ‘ of 

tate each child’s growth and 7 ‘ gned *° £acilP 

corroborative data and as sunnl 2 reSUl * S t0 be used as 
vations.' An experienced teacherwill V ‘° ^ ° bSer_ 

. T . . r wlU know without being told 

Ihis point can hardlv 

z « ,h -». ssfffr Th = 
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P ' C,al,st "> illustrates this: 



CHOOSING THE RIGHT TEST 


31 

that the tests that are most appropriate in the fifth grade are 
not necessarily those that provide valuable clues in the first 
grade. In formulating objectives, the group must take into 
consideration the fact that the testing program is a whole- 
school program. We shall therefore approach the problem of 
selecting tests in terms of a minimum program for the primary 
grades, the intermediate grades, and the upper grades. 

SIGNIFICANT AREAS TO BE TESTED 

It might appear that the most important test in every grade 
is the intelligence test. However, as we shall see in a later 
chapter, rate of intellectual growth has not become steady 
enough in the first grade to make the intelligence test a wholly 
reliable indication of growth. Giving intelligence tests at this 
level involves the risk that a child may be branded as unin- 
telligent because he does not understand the directions in a 
group test or because for some reason it is difficult for him to 
maintain interest. The results of one type of test, however, 
are much less likely to be stigmatizing after a year or two: 
the reading-readiness test, which many teachers prefer to 
the intelligence test. This device has the advantage of dealing 
rather specifically with an aptitude that is of immediate prac- 
tical importance to the teacher and the pupil. The results are 
directly applicable to the question of whether reading experi- 
ences should be initiated at once or whether it would be better 
to spend time on a developmental readiness program. An IQ 


“Don’t mistrust your own observation about a child. Even a competent 
oculist or ophthalmologist docs not have the advantage of seeing symp- 
toms of visual difficulty after the child has been using his eyes for a 
prolonged period. His tests may be very good — as far as they go. The 
teacher secs the child in a functional situation." Evaluation of the 
child’s vision is parallel in this sense to the evaluation of his intelli- 
gence or social adaptation. 
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will not yield this information; an MA would come closer to 
telling the teacher what he needs to know. 

Intelligence tests may be used, with due reservations, in the 
primary grades, but some teachers prefer to wait until the third 
grade or even the intermediate grades. Selection of a test at 
this level, however, should be influenced by the experience of 
the teachers involved. If none of the teachers in the group has 
had any experience with the tests, it is wise to apply to a 
teacher in another system for suggestions. A letter addressed 
to the superintendent of schools of a city system would be 
turned over to competent persons (perhaps specialists in test- 
ing) who would be willing to make helpful suggestions. 

After some tests have been suggested, the group should 
obtain the description and evaluation of these tests from the 
basic reference book. Oscar Krisen Buros’ Mental Measure- 
ments Yearbook. The value of this work can be partially de- 
termined by reading a representative entry such as the follow- 
ing : 2 


[2551 

Pmtner General-ability Tests: Verbal Series. Grades kgn-2, 2 -4, 

p 9 , +; ' 923 ^ 6; 20 < P" manua >; World Book Company. 

Form's" Tb 45 per"”? °" d " ^ 1923 ^ 6 '' 

Pinter P \r \z * P 25> 35 ^ per s P ecime n test; Rudolf 
rntner, Bess V. Crrnntngham, and Walter N. Durost. 

) mtner-Ditrost Elementary Test. Grades 2.5-4.5- Scales 1 

set; <45) “ Rud ° if 

c) Pintner Intermediate Test. Grades 4 5-9 5- ro-rr 

°f Pm, net ln,elli g ence Test; Forms I t SI 70 ^ 17*'°" 

specimen set; $1.20 per 25 * ’ 7 ° per 25; 35 <' P er 

(55) minutes; Rudolf Pi„mer e ' SC ° rable anSWer sh,:ets; 45 

"7 Bremen, , Year 

“ Rut S er s University Press, 1949, p. 334. 
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d) Pintner Advanced Test. Grade 9 and above; 1938—42; Forms 
A, B; $1.70 per 25; 35 £ per specimen set; $1.20 per 25 machine- 
scorable answer sheets; Rudolf Pintner. 

Following these basic data are reviews and evaluations of 
this test by competent reviewers. Excerpts from two of these 
reviews follow : 3 

Stanley S. Marzolf, Professor of Psychology, Illinois State Normal 
University, Normal, 111. 

The reliabilities obtained by the split-half and interform meth- 
ods for the various batteries are, in the majority of cases, in ex- 
cess of .90. Sources of reliability data are given in all instances. 

Standardization has been based on “approximately 100,000 
tests from widely separated parts of the country.*’ Further collec- 
tion of scores for normative purposes is now in progress. 

The computation of deviation IQs is amply explained and il- 
lustrated. For the Intermediate and Advanced Tests a monograph 
which facilitates computation of IQs and centile equivalents is 
provided. 

This series is one of the best available for school use. The tests 
are easy to give and score. Raw scores are easily converted to a 
normative form. The same score system — standard score, mental 
age, and deviation IQ — is used throughout the series. The attempt 
to make the tests comparable at all grade levels is commendable, 
even though empirical evidence that this has been accomplished 
is lacking. 

D. A. Worcester, Chairman, Department of Educational Psychol - 
ogy and Measurements, The University of Nebraska, Lincoln, Neb. 

The intermediate and advanced tests each have eight subtests. 
AH arc timed but with limits so liberal, intentionally, that they are 
not to be considered as speed tests. The materials of the tests are 
on the whole of the kind that one finds in most of the conventional 
intelligence tests. 

Each test of the series has received careful statistical treatment 


* Ihid., p. 336. 
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and the statistical findings are given in the manuals. Norms for 
the tests are articulated with each other, making possible com- 
parable measures at the various age levels. Scores may be in- 
terpreted in almost any way which the user may wish: standar 
scores, ratio or deviation IQs, percentile ranks, mental ages, or 
grade equivalents. Machine scoring is available for the inter- 
mediate and the advanced tests. While the task of administering 
these tests is somewhat greater than that for some of the tests con- 
structed more recently, there is evidence that they have been con- 
structed with care and may be employed with good results. 


In the Yearbook several hundred tests are listed, classified, 
and evaluated; hence the suggestions of experienced users of 
the tests can save time and prevent the possibility of the teach- 
er’s becoming dismayed by a plethora of titles, addresses, and 
statistics. If the Yearbook is not available locally, it can prob- 
ably be borrowed from the state library or the state depart- 
ment of education. 

If it is not possible to get hold of the Yearbook, it is wise 
to obtain catalogues from the test publishers. Although this 
list should not be interpreted as endorsing any individual test, 
such addresses as the following will provide a starting point: 
Educational Test Bureau, 720 Washington Avenue, S.E., 
Minneapolis 14, Minnesota; California Test Bureau, 5916 
Hollywood Boulevard, Los Angeles 28, California; American 
Council on Education, 744 Jackson Place, Washington 6, 
D.C.; World Book Company, Yonkers 5, New York; Science 
Research Associates, Inc., 57 W. Grand Avenue, Chicago 10, 
Illinois; Psychological Corporation, 522 Fifth Avenue, New 
York 36, New York. 


The Yearbook and the addresses of publishers will, of 
course, be helpful in selecting all kinds of tests, not merely 
those relating to mental ability. 

Achievement tests are designed to indicate the pupil's pres- 
ent status regarding skill and knowledge in such subject-mat- 
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ter areas as reading, vocabulary, arithmetic fundamentals, 
arithmetic operations, spelling, English usage, etc. Advanced 
batteries for use in the upper grades and high school sample, 
in addition, such areas as literature, history, civics, and 
geography. Norms for achievement tests are typically de- 
scribed in terms of age standards, grade placement, and per- 
centile ranks. Again, these norms (or averages) represent the 
scores of typical third, fifth, sixth, etc. graders and are not to 
be considered as standards for individual children to reach 
or excel. Nor should the teacher regard the norm as a stand- 
ard that should be achieved by his class this year. Achieve- 
ment scores must be interpreted in terms of indicated pupil 
potential; that is, the average on achievement tests of the class 
this year may be in part evaluated in terms of the average ob- 
tained on a test of mental ability. What is important is that the 
norms provide clues for evaluating the status and progress of 
individual pupils. The achievement test selected should pro- 
vide equivalent or comparable forms, since maximum utility 
will be obtained when the score a pupil makes this year can 
be compared with his score of a year or two ago. If a dif- 
ferent test is used, even a test covering the same area, the 
feasibility of comparing them is materially reduced. 

If a particular pupil does not seem to be doing so well on 
achievement tests as his mental-ability test “promised,” an ex- 
planation may be derived from a diagnostic test. In such sub- 
ject areas as reading, arithmetic, and spelling, a diagnostic test 
serves the purpose of locating rather specifically the difficulty 
the child is encountering. In reading, the difficulty may be a 
weak vocabulary, lack of method in word attack, or lack of 
experience leading to interpretative ability. The test does not 
tell what should be done; it simply narrows the area of search 
for a constructive remedial program. Similarly, in arithmetic 
the diagnostic test will help one find a specific area of dif- 
ficulty, which might be a particular erroneous number com- 
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bination, such as 7 X 6 = 52, or failure to understand bor- 
rowing in subtraction. Or perhaps the pupil understands the 
processes but does not make the correct choice of operations; 
that is, perhaps he does not understand the written problem. 
In order to detect particular areas of difficulty, the diagnostic 
test is divided into distinct parts which employ specific opera- 
tions (addition, subtraction, etc.) and particular number 
combinations. 


Teachers often find that special-aptitude tests are of value 
in understanding the “whole” child. Tests of musical aptitude, 
mechanical aptitude, and art may be helpful in suggesting 
academic approaches which will permit the pupils to experi- 


ence a degree of success and thus become more strongly 
motivated. Language-aptitude tests, mathematics-aptitude 
tests, and vocational-aptitude tests are useful in academic and 
vocational counseling at the secondary school level 

Teachers have traditionally been, and many of them are at 
present, concerned mainly with the academic adjustment and 
achievement of pupils. However, increasingly teachers are 
realizing that other phases of adjustment are of equal im- 
portance. In fact, personal and social adjustment may be of 
” 0re m ? ortance immediately than academic adjust- 
r i’t C r m '° functi0n wel1 in ‘he academic situa- 
onal and rT m ^ PUpi ' ” e “ free - I— of per- 
sonality'evaluatl ^ ltae 3re SCT « a 1 P- 

methods is direct" m ° S ‘ readUy avaiIa Me of these 
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giene, and abnomal i. ps y chol °gy. mental hy- 
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teacher’s observations more penetrating and his evaluations 
more accurate. Rating schedules also are of real value in 
narrowing the range of search for possible sources of dif- 
ficulty in personality adjustment. These schedules are of two 
kinds: one in which other pupils or the teacher rate a pupil 
in terms of given characteristics, and one in which the pupil 
rates himself in terms of given qualities or reactions. Person- 
ality questionnaires are similar to rating scales except that in- 
stead of rating one’s self on a three- or five-point scale, the 
subject answers the questions with “Yes,” “No,” or “Ques- 
tionable.” However, the results of formal questionnaires and 
rating schedules must be interpreted in terms of how the pupil 
behaves in the classroom and on the playground. 

Personality rating scales and questionnaires cover a variety 
of areas of functioning. Some deal with health attitudes, ethi- 
cal considerations, family relationships, and interpersonal ad- 
justments. These evaluative techniques are steadily increasing 
in number. Hence it is advisable that the teacher study the 
catalogues, tentatively select a few tests, obtain specimen sets, 
and then carefully read the manual to determine whether each 
specific test fills the needs of his situation. 


APPLYING THE CRITERIA OF GOOD TESTS 

After deciding what areas are to be tested and after select- 
ing sample tests, the teacher group should study the tenta- 
tively selected tests in each area from the viewpoint of the 
criteria of good tests. In order to discover the relative merits 
of the tests in the various areas, the significant data regarding 
each test should be summarized or tabulated on a check list; 
these data, it will be remembered, may be taken from test 
catalogues, the manuals of directions, or the Mental Measure- 
ments Yearbook. A sample check list is shown in Figure 1 . 

The check list obviously does not automatically select the 







5. Good screening device. Should be supplemented by observation 
and intelligence-test data. 

6. Brief, easily understandable. Norms may be interpreted without 
skill in statistics. Predictive value varies with methods used in 
teaching reading. 

7. Excellent manual. Contains bibliography valuable for teaching 
suggestions. 

8. Heavy stress on letter symbols involved in reading. 

9. Other aspects of intelligence besides “academic’ aptitude are con- 
sidered. . 

10. Tests of spatial relationships, logical reasoning, numerical reason- 
ing, and verbal concepts. . , 

11. Takes somewhat longer than many other tests because of break- 


down into part scores. 

12. Verbal and nonverbal scores. , 

13 Valuable because it involves less reading than many other tests. 

14. Much of the explanation is in terms of the authors experience, 
which is, however, considered more than adequate by reviewers. 

15. Correlates well with school achievement particularly for group 

16. Considered by reviewers to be somewhat weak in terms of indi- 
vidual predictions. 

17. Ease of scoring is main feature. 

18. Because of variation in courses of study, curricular validity must 

19 SubtS^ovtr'aU 'fundamental school subjects. Primary battery 
contains word and phrase recognition, word meaning, and num- 

20. There are partial batteries for those whoosh to give the test in 

21. CoverstemsCnd in the typical curriculum. Must be interpreted 
in the light of local emphases. 

22. Superior and practical. which shouId , however, 

23. AH the tests ““^"“piementary data. 

be used in conjunction v J ret score s in the light of block- 

24. An attempt has been made to mlerprei 

promotion practices. h ave been carefully 

25. Not cited, but reviewers assert that 

and competently constructe • to D upils. Directions for interpre- 

26. Brief, simple, clear instructions to pupils. 

tation are clear and P” ct '“ ' be cons idered seriously only in 

27. National norms should probaDiy 
the skill areas. 
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28. Gives percentile rank for adjustment in terms of home, school, 
health, and social areas for various school levels. 

29. User would need training in testing and mental hygiene. 

30. Can be used for rough psychiatric screening. 

31. The purposes of the test are inadequately disguised, and falsifica- 
tion of answers may be possible. 

32. Offers some remedial approaches which, though acceptable, are 
somewhat superficial. 

33. Subtests are: self-reliance, sense of personal worth, sense of 
personal freedom, feeling of belonging, withdrawing tendencies, 
and nervous symptoms. 

34. A coding system prevents subject from discovering the exact na- 
ture of the test. 


35. Calls attention of teachers to difficulties 
they might otherwise be unaware. 


of adjustment of which 


test. Each test seems to have some merits and some disad- 
vantages which are not found in others. Group discussion of 
relative values is advisable for the selection of the tests which 
are most appropriate for local purposes 


rrecautions 
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produced by the change can be evaluated. By the same token, 
the new test should be given an adequate trial. It should be 
used for a minimum of three years, and if it is then found to 
be satisfactory, its continued use is justified. 


Follow-up Evaluation 

One aspect of the problem of test selection remains to be 
dealt with, even after the committee has chosen a given bat- 
tery and the tests have been administered: the tests should be 
discussed and evaluated in terms of the experience of ad- 
ministering them and making use of the results. Teachers find 
it helpful to discuss the problems they have encountered and 
get the suggestions of other teachers for overcoming these dif- 
ficulties. For example, such remarks as the following are likely 
to be made: “This test is too long for first graders. This 
achievement test does not parallel the suggested course of 
study for the state (or locality).” “This test is so short that 
doubt that it samples adequately.” These remarks however 
should be viewed as precautions and limitations to be applied 
in interpreting test results, since all tests are likely to have 
their limitations. Some of the value of tests wil be lost if a 
test is discarded because it does not fully suit all users. It is 
probably better to put up with some shortcomings ,n a test 
rather than dispense with the values of comparability that re- 
sult from using the same test over a perio o years. 


SUMMARY 

Choosing tests which will make a maximum contribution 

to the understanding of children and the .mprovement of in- 
to the unae b shou ld involve many teachers, 

s, ruction is a process which sho ^ teachers 

Even if the service * work involved in choos- 

should participate e • ^ school and thc individual 

ing has educative \aluc, i ) 
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teachers should gain the advantages of pooled experience and 
study, (3) teachers know more about local educational ob- 
jectives than experts, and (4) teachers know the needs of 
pupils in terms of their community background. 

The first task of the group is to determine as exactly as 
possible the purposes which the tests are to serve or to facili- 
tate. The kinds of tests should be named — intelligence, special 
aptitudes, achievement, personality inventories, etc.— and 
tentative lists should be suggested by capitalizing on the ex- 
perience of teachers on the staff or by contacting teachers or 
administrators outside the system. The group should obtain 
sample cop.es of the suggested tests, together with manuals, 
the M en , C rrr re the teStS ’ USing the manua1 ' catalogues, and 

of -S fl YearbooK in ,erms of the criteria 

wiI he o r CeSS 0t teSt Sdecti0n and Nation does not end 
™ow.edt nL StS ' “ Sh0U ' d be cont ' nue d in the light of 
tests It should . eXpenetlCe gained during actual use of the 
portant the real.' *"*?*?* that ' al *°ugh selection is tat- 
of data This inf ' SSUe ' S ’ he in,er P re tation and use 
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study and discussion exercises 

1. Explain how it is tint , 

Or ability may both be reliable -nj* S “ verin B the same subjecl 
2- Draw up , list of p"™ ' d but be comparable, 
school where you teach or for some ! , Stl " g P r0 S ram for the 
quamtcd. ,or some school with which you are ac- 

-a are aequainted in the 
^^--rtsueh.-^-^^ 

Explain the meaning of the statement, “Tests do not teH 
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what should be done,” If they do not do this, then of what value 
are they? 

5. Do you agree with the contention that the teacher’s judgment 
and estimate should be used in evaluating the capacities of pupils? 
How does this fit with the notion that evaluations should be ob- 
jective? 

6. Prepare a check list similar to the one presented in this chap- 
ter. Add to it some tests in several areas and make a tentative 
selection of the best test in each area to fit your particular needs. 

7. Why is the follow-up evaluation of the test important? How 
long should one keep an unsatisfactory test before sacrificing the 
value of year-by-year comparisons? 


SUGGESTED ADDITIONAL READINGS 


Buros, Oscar K. (ed.): The Fourth Mental Measurements Year- 
book, Highland Park, N.J.: The Gryphon Press, 1953. 

This book should be available to every committee charged wtth 
responsibility for test selection. Older tests are reviewed and 
evaluated in previous issues of the yearbook. All tests to date 

are indexed. . . 

“How to Select Tests,” Educational Bulletin No. 2, Los Angeles: 

California Test Bureau, 1945 (free). 

This brief bulletin gives concrete advice on problems accd in 
test selection; it is illustrative of the good free material that is 
sometimes available from test publis ers. 

Jordon, A. M.: Measurement in Education, New York: McGraw- 

Measuring Instruments,” de- 
This chapter, C -..-lities of good tests. A knowledge of 

fines and illustrates e ' ^ js bas j c t0 soun d test selection, 

the meaning of these q Margaret Selover, and Agatha 

Traxler, Arthur E the Use of Test Results 

Townsend: Introduction to Harper & Brothers, 1953, pp. 
'« Public Schools, New York. 
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’ rnn net on particular problems 
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sional literature. 
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How Norms Help Us 
Size Up Pupils 
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HOW NORMS HELP US SIZE UP PUPILS 
meaningful units for the measurement of human behavior in 
areas for which no adequate “yardstick is yet available. The 
attempt to develop such measures has resulted in the wide- 
spread use of norms. 


NORMS: MEANING AND DERIVATION 

When someone tells us that a certain girl is five feet tall, 
we may think of five feet of linear height as an absolute meas- 
ure, or we may think of persons of our acquaintance who are 
five feet tall. In order to make use of such a measurement, 
we would have to know the age of the girl. She may, for in- 
stance, be short for an adult or tall for an eleven-year-old. 
Hence, the significance of simple quantitative measurements 
depends upon comparisons and relationships. 

In the area of educational measurements, there exist no 
absolute units such as the inch or the foot. A test score, for 
example, is comprised of responses to a number of test items. 
The items are not all exactly alike, as are standard measures 
such as inches; rather, they vary in nature and m difficulty. 

Let us say, for instance, that Jerry has a score of 43 on an 
arithmetic test. What does this score mean? Were there 43 or 
143 test items'! Were the items related to addition, subtraction, 
multiplication,' division, or all of these? How difficult was the 
test? The score of 43 becomes meaningful only when it is 
placed in a framework which enables us to make comparisons. 
If we find that 43 is one of the highest scores achieved by 
members of Jerry’s classroom group, the score takes on a 
value, for the important thing is not what score Jerry made 
on the arithmetic test, but how his score compares with other 
scores derived from the same measuring instrument. The con- 
cept of norms is based upon such comparisons. Test scores are 
placed in a framework which helps us to relate scores , 0 0ne 
another. Norms, then, are relative measures or derived scores 
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designed to help the teacher interpret test results within a 
meaningful framework. 


When a test is administered and scored, the first result is a 
raw score, which is ordinarily the sum of the correct responses 
or the total of the values assigned to the several items included 
m the test. The raw score, like the score of 43 which Jerry 
attained on the arithmetic test in the foregoing example, has 
no e mite meaning in itself. The teacher, like the test maker, 
faces the problem of giving meaning to the test score. 

e teacher who wishes to make Jerry’s score more mean- 
on rh’ this question: .. How gQod js a scQre 43 
" “ m my dassr00ra? Is ^ average, better than aver- 

si zc of s ° W . avcra f c? Arranging the papers in order of 
Paper as a no' T™. ' S , heSt '° lowest ' he may use the middle 
score of 43 .."f n- reference - In this way he can evaluate a 
theses Or 1 8 ^ UP!>Cr half «“ >™er half of 

finding the averageTcore^'ln'e'^ ba “ COm P arison by 

a point of reference which ^ CaSe ’ he has established 
this classroom group ' 8 ‘ Ve meanin 8 to scores within 

Pioycd S ; y P r tZT in , many - W similar to those em- 
sa mplc or group is h " ? ^ Wh ° “'ablishes a 
administers his test to »h* ** ^ ^ reference scores and then 
To make the ? ° rder *° study the results, 

midpoint of the distribution 8 r' ’ ^ may SdeCt a SCOre at ,he 
median, or fiftieth percentile Th | S SCOre is called the 

tnto two equal halves Half th" " div i des 1,15 distribution 
Point and half below. ' ‘ lnc sub j=cts scored above this 

the scores to csiabTfah a nhrfnf ' he aVerage ' or mcan - of a " 
° f scores divided by ,h„ The ™an i» the sum 
jpical performance for a gr0UD Cases ’ and jt represents 

P P.'s at a grade level i, ca p d P . hc lypical score made by 
Cd tbe prade norm. The basis for 
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the norm, then, is the midpoint (median) or the average 
(mean) . In either case, the norm is a reference point presumed 
to represent the level of attainment typical of the defined group. 

Procedures very similar to this may be used to develop 
norms for any defined grade or age group or for any other 
special classification, such as school beginners, college fresh- 
men, or graduate students. The norm, then, is a reference 
point derived from a study of the scores of a selected group. 
This group is called the standardization sample and should be 
explicitly defined in the manual which accompanies the stand- 


ardized test. 

The sampling procedures are, of course, very important to 
the interpretation of the results of the test for any group. The 
teacher will need to know whether the standardization group 
is in general similar to his group or in some significant way 
different from it. Norms on two tests are not necessarily equiv- 
alent, since the groups from which representative scores, or 
norms, were derived may differ. For example, in a recent re- 
port, average IQs for the same pupils in one town varied from 
99 9 on test A to 107.2 on test D on four well-known intelli- 
gence tests. 1 For this group of pupils, test D appears to be 
relatively easy as compared with test A. The norms for these 
two tests are, therefore, not directly comparable so far as these 


pupils are concerned. 

In general, in establishing intelligence test norms, IQ 100 
is the average or median attainment for each age group. Thus 
the pupils in the example above differed markedly from the 
group which formed the original standardization sample, at 

least in ability to respond to test D. 

Although the discussion above is m many respects an over- 
simplification of the procedures involved in the establishment 


'Kenneth Eells, A. Davis, R 
R. Tyler, Intelligence and Cultural 
Chicago Press, 1951, P- It**. 


J. Havighurst, V. E. Herrick, and 
Differences, Chicago: University of 
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of norms, it does appear to represent the basic principles and 
problems involved. The procedures in test standardization 
might be reiterated and expanded at this point as follows: 

1. Test questions are gathered and a trial form of the test 
is developed. 

2. The trial test is administered to a large group of pupils 
at the age or grade level for which the test is designed. 

3. A careful study of results is conducted in an effort to se- 
lect the best items, and the test is drawn up in final form or 
forms. 

4. A large group of individuals is chosen as the basis for 
the derivation of test norms. Problems of grade or age levels, 
typicalness or representativeness, and other basic problems of 
classification of the sample are considered as this selection is 
made. 

5. The test is administered to the selected group under 
standard conditions of directions and time allowances, as 
outlined in. the test manual, and the scores are studied. Typi- 
cal scores for various special classifications, such as age or 
grade, are determined, and norms are established. 

6. Other problems which the test maker faces are: 

a. How valid is the test? 

b. How reliable is the test? 

c. How can scores from various forms of the test be 
made comparable? 

d. What kind of norms should be presented (age, grade, 
percentile, standard scores)? 

e. How can the results be interpreted and utilized? 

Information concerning these points is generally available in 
the test manual. 

Although many of the procedures listed above might be 
described in greater detail, it is clear that the standardization 
of a test is a long and arduous task requiring thought and a 
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high degree of skill on the part of the test maker. It is clear, 
too, that norms have a very definite reference pomt-the 
group or groups that comprise the standardization sample. 
When the classroom teacher uses norms, he compares the test 
results of his pupils with the over-all results from certain spe- 
cific groups. The teacher can generalize the results of testing 
his group as a comparison with all pupils in a specific classi- 
fication only to the extent that the original group is indeed 
representative of the entire population m the classification he 
is using. 


MAKING USE OF AGE AND GRADE NORMS 

Age norms show the standing of the pupil by relating his 
test score to the score which is typical of a particular age 
group. Age norms are developed as follows. 

The test is administered to various age groups, such as chil- 
dren between the ages of 6 and 12 years. A typical score (the 
en Between » , acb a „ e group and perhaps 

mean or median) is found h Supp S ose the results are 

each half year o c rono 5Q) The scor e column gives 

similar to those in a hypothetical vocabulary 

the total range of test “£££, represented by * 

test. Typical scores or ^ These may be the medians, 

heavily lined bars in e average scores for each age 

or midpoints, or they »,b ^ or the 

group. A score of 25 the ^ ^ 2g the se ven-year 
typical of the S1 U»E P tests utilizing age norms are so 
norm, and so on. ™ ‘ y ’ {or lowe r age groups are lower 
designed that typical scor Thus a ser ies of graduated 

than those for higher age groups. 

age steps or levels is «tablishe^ ^ distributions o{ ^ 

The crosses in TaMe P (hat the SCO res vnry aroilnd 
for each age level. It vvil , ds achie ve a score of 25 

the norm; that is, not all six yea 
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’’ SUPP °' <: ' hat Jun,: iS ci S' K >oars old. Her test score is 25. 
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Her vocabulary age is six years. This is interpreted as meaning 
that on this vocabulary test June’s score is at the level typica y 
attained by the six-year-olds in our standardization sample 
2. John’s chronological age is nine years. His score is 37, 
which is the score typical o£ eleven-year-olds m our sample. 
His vocabulary age is eleven years on this particu ar te. 

In this way age norms enable the teacher to compare a 
child’s score with the level of attainment of specified ^ 
groups. In addition, this comparison is based upon specific test 
materials. The test we have described is base on P 
of vocabulary items, and the scores are related 1° the attam- 
ment of groups of children who comprised the standardize! 
sample. The interpretation and use of age norms nui t b 
based upon a recognition of these factom n ‘ 
that are common to all tests, such as va i i y norms ac . 

The teacher will find a variety of tables ° “ a 

rrrnnn tests which are widely employed m 

srsr «■— “ 

materials are employed. from scores on tests 

The mental age, or * mental maturity, gen- 

which purport to measure 11 8^ agCy or EA 

eral mental ability, and so • groups which 

is derived from sets of ^ ^ area educational 

have been based on test J l ^ norms are developed 
achievement or scholarship- -.toinment 

on the basis of tests ^ ° orm s mu st have reference 

In all the above ***’ and achie vement, (2) the spe- 
to (1) definite areas areas of ability or achieve- 

cific items used to samp of pupils w ho formed the 

ment, and (3) the group g P , b repre sentative 
standardization sample which P«£° rts 
of the pupil population at the specified age levels. 
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Age norms help the teacher to think about children as they 
are and as they compare with others. One of our serious edu- 
cational tasks is to locate the pupil in the academic milieu. 
Where shall we start with Mary? How well can Betty read as 
compared with other pupils? What materials are best suited 
to Anne’s needs and capacities? What level of achievement 
should be expected of Roger? Age norms are not the answers 
to problems; rather they serve as markers or guides which 
are of value in so far as their meanings and implications are 
clearly understood. 

Perhaps the type of norm most widely used is the grade 
norm, particularly at the elementary school level. Grade 
norms are developed in much the same way as age norms, but 
the reference point for grade norms is, of course, grade level 
rather than chronological age. The procedure might be sum- 
marized as follows: 

1. The test (for example, an achievement test) is admin- 
istered to specified grade groups, say, grades four through 
eight. 

2. Typical scores (medians or averages) are worked out 
for each grade level. Perhaps additional central scores are 
found for each half term at each grade level. 

3. These typical, or central, scores become the norms for 
the grades, and intermediary points representing number of 
months in the grade are developed. Hence the teacher finds 
that a test score of, say, 68 represents a grade placement of 
4.6, which would indicate that this score represents the typi- 
cal performance of pupils who have been in the fourth grade 
for a period of six months. Grade norms are ordinarily based 
upon the supposition of a ten-month school year. A study of 
Ihe norms tables for any achievement test which gives grade 
norms will indicate whether the base is a ten- or twelve- 
month year. 

To read the table of norms, the teacher finds the pupil’s 
raw score. Mary, for example, in the sixth grade scores 120 
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on a general-achievement pupils 

months. This decimal part o mationj as norms a re 

very probably an estimate or PP ^ ^ school yea r on 
not usually worked out for cac vvould be endless. 

the basis of actual sampling, sue ^ .helps the teacher 

However, this norm-grade p|a«®“ ^ a sixt h-grade class, 
to understand his problem wi _ sup erior by C om- 

Her scholastic attainment is pils used as a basis 

parison with the reference group ^ ^ ^ likely to find 
for the standardization of the test, a ^ perhaps boring, 
the usual sixth-grade materials qui be a i, out a t the 

Work designed to challenge ar ^ ^ a t a ny rate, some- 
level suited to beginning eighth gra ; s su it a ble for 

what richer and more varied t an^ us t0 i oca te Mary 
typical sixth graders. Thus the no reference point in 

on the academic ladder and provides 
planning for her. . w jth teachers simply be- 

Perhaps grade norms are pop g rad e system. The ele- 
cause of their familiarity w hi quite familiar with 

mentary school teacher is h t j, e quality and level of 

third, fourth, or fifth graders an ^ just as junior and 

achievement of pupils in thes wjth the level of 

senior high school teachers ar ffl rf grades is deeply 

achievement of their grades. . Hence , the location of 
ingrained in our educational tn the teacher a basis for 

a child in terms of a grade "actual grade placement of the 
planning regardless of 
pupil. 

_ oF percentile ranks 
T„B »EA«W „ „« 

Comparing a child t0 0 ,eSt A & e 

most useful ways of S lV1 
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norms, as we have seen, locate the pupil in terms of age 
groups but not necessarily with pupils of his own age or 
grade. Comparisons with others of the same age or grade are 
most commonly made in terms of the percentile rank of the 
score a pupil attains on a given test. Percentile ranks indicate 
the relative standing of the pupil in a defined age or grade 
group. 

If, for example, Jean’s score on a particular test places her 
at percentile 20 for fifth-grade pupils, this means that her 
score is better than those of 20 per cent of the fifth graders 
who comprised the standardization sample. However, 80 per 
cent of this group of fifth graders made higher scores than 
Jean. The norm indicates Jean’s standing among fifth graders, 
and it must be understood that the 20 does not refer to 20 
per cent knowledge of the test area or imply that Jean an- 
swered 20 per cent of the questions correctly. 

The example above indicates further that the percentile 
norm is a separation point in a distribution. That is, a per- 
cenule rank of 85 is a point which separates the upper 15 per 
cent from the lower 85 per cent of the group. Hence, this 
df °r r m S ,‘ VeS “ S the rda,ive rank of Pupils, an indica- 
hundmd Standing “ a hy P 0,hetical °f one 

measure? di * CUltieS enc ° untered in utilizing ranking 
do not reure? T norms is fact that the ranks 

for instance is ° f ™ easur cnient. The foot rule, 

with every other ‘ nt ° ‘ Welve inches ’ each identical 

of uniform lenph— thaT tta' ^ ^ “ fUler Were n °‘ 
excessively long and thm 31 eaCh extreme are 

as in the diagram in Figure T Cln^ eXCessiveI y short ’ 

different meanings as units of me?* 3 <he mches haVe 
their location on the rule. surement, depending upon 

An analogous situation prevails in the 

p aus in the use of percentile 
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ranks as units of measurement. These 
“shorter” near the eenter of any distribution m theensethat 
they imply smaller differences in actual score po mts near the 
center of the distribution; they are longer n 

-» *» - r 

are greater at the extremes. That , , t 

„ . E .«, «, , 

the upper extreme— say, percen ^ ^ other 

difference of twenty score pom center of 

hand, a difference of five percent, le ran to near 
the distribution-say, percentile 50 to 55-may mean a 
ference of only two or three score pom 


I 1 -L diagram 

Fig. 2. Illustration ot une ^ ua ’ , , h units 0 f measurement or 
represents a foot rule so divided that the un 
“inches" are of unequal proportions. 

differences which appear alike 
This actual inequality of diller ^ ^ fact 

when converted to percentile r ^ P cor(J in the midd i c 

that large numbers of individua extreme. Since 

percentile ranks are based 0 J P (he ranks jump rapidly 

eluded at or below certain P ■ Convcrse ly, percentile 

wherever large frequencies 

ranks increase slowly where up of a hundred 

To use another — ; 'fJs.andS and defeats the 
persons running a race, u nincty . ninc , an d his per- 

others. He is better than ^ ^ sccond ■ quilc 

ccntilc rank is 99. me P . distance does not affect hh 

some distance behind the fi , n of Ihc runners; 

rank order. He is etter ^ ^ largc group of average 
hence his percentile rank 
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runners are probably massed at one point on the race track 
and reach the finish line in a group. The one who finishes 
fiftieth is better than fifty of the runners, so his percentile rank 
is 50. He has probably just barely defeated a number of oth- 
ers who are very close behind him, but he ranks 50 and the 



a race. The winner, at A, is better than the other nine runners (90 
per cent); therefore his percentile rank is 90. His percentile rank is 
ten “points” better than the rank of runner number two, at B, who 
attains the percentile rank of 80, since runner number two is better 
than eight of the ten runners, or 80 per cent. Note, however, the long 
distance which separates the two runners at A and B. 

The runners in area C are grouped closely together, but each 
achieves ten percentile-rank “points” above the nearest succeeding 
runner, since each comprises 10 per cent of the total group. Ten per- 
centile “points” at C appear to be the same as ten percentile “points” 
at A but give no information as to the actual distance separating the 
runners. This grouping near the center of a distribution, as at C, is a 
common feature of the score distributions of educational and mental 
tests. 

next runner 49 regardless of the very short distance between 
the two. 

At the slow extreme, perhaps one runner is trailing far be- 
hind. He is last, and his percentile rank is zero. The nearest 
runner in front may be far ahead of him, but that nearest one 
is better than only one runner, the last. His percentile rank is 
one regardless of the distance that separates him from the 
last man. 
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Figure 3 shows this situation for ten runners. It is apparent 
that percentile rank depends entirely on the number of peo- 
ple at a point in the distribution. Thus rank is independent 
of actual score value or number of items answered correctly. 
The teacher who utilizes percentile ranks should keep in mind 
the fact that these “norms” represent variable units of meas- 
ure, especially when he is tempted to make distinctions among 
pupils on the basis of percenlile-rank differences. 

In spite of these characteristics, percentile norms are as 
valuable to the teacher as any ranking system, for they indi- 
cate the relative standing of the pupil in a defined age or 
grade group and thus help the teacher understand the pupil, 
plan for him, and develop reasonable expectations for him. 
They “locate” the pupil among others of his own age or 
grade. 

STANDARD SCORES AND MEASUREMENT 
PROBLEMS 

Standard scores represent the attempt to develop equal 
units of measurement. As we have seen, percentiles are in real- 
ity ranks which have no uniform reference to size of score. 
Standard scores on the other hand, are determined by the 
number of points the pupil scores rather than by his rank 
order in a group. Standard-score units are equal throughout 
the distribution. 

A major problem in educational measurement is the fact 
that there is no such thing as an absolute zero. In other 
words, there is no beginning point, as there is with linear 
measurement. Where, for example, docs intelligence begin? 

Is there such a thing as zero intelligence? Since such starting 
points cannot be established in developing measurement sys- 
tems based on standard scores, a central point is used as a 
reference. This point is the mean, or average, score. In dcriv- 
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ing standard scores, then, the first step is to locate the aver- 
age score of the group. 

The next step is based on the assumption that score dis- 
tributions will normally follow a rather definite pattern. This 
pattern is called the normal curve and is represented by a 
symmetrical, bell-shaped curve with definite mathematical 
properties, as shown in Figure 4. The high parts of the curve 
represent the highest frequencies of scores, at C, M (the 



mean), and D. At B and E the shorter vertical lines repre- 
sent fewer scores as we approach either extreme. At the ex- 
me e t and right, the extremely low and high ends of the 
VCry infre< l uent ’ this, as we have seen, is the 
stnbut.on found where large numbers of persons are meas- 

ZooT re$PeC . '° aIm0St any character istic. For example, 
extremdv lT ° f adul,s - P— are 

very few aeai ’ ^ are about avera S e ' m height, and 

d^buir" “ y d r This> then - is the type of 

expect to find (or normally find) when ade- 
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quate measures of human characteristics are applied to large 
groups selected at random from the population. Standard- 
score systems are based upon this assumption and may be de- 
veloped in relation to distributions which approximate the 
form of this normal curve. 

Assuming that the bell-shaped curve represents the normal 
or expected distribution of test scores, the problem resolves 



Low Average or mean Hiah 

Fig. 5. Diagrammatic representation of the weighting of standard 
scores in terms of deviation from the mean. 


itself into that of making allowances for the high frequency 
of scores around the center of the distribution and the low 
frequency at the extremes. To do this, scores near the ex- 
tremes are given added weight depending upon the extent of 
their deviation (in score units) from the average score. The 
greater the deviation of the score from the center, the greater 
its weighting. This process gives a result analogous to that 
shown in Figure 5. 
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Figure 6 with a score of 50 representing the mean or average. 
The base line in this figure illustrates the fact that standard- 
score units are equal in base-line length. The base line here 
represents score points. 



Approximote per cent of scores foiling below standard score points 
C 2% 16% 50% 84% 98% 

Fig. 6. Standard scores, showing equal base-line units and approxi- 
mate proportions of scores included below stated standard-score points. 

Opposite A above are shown the proportions of the population 
likely to receive standard scores within the limits specified. For ex- 
ample, approximately 14 per cent of the population are likely to re- 
ceive standard scores between 30 and 40 or between 60 and 70. 

Opposite B are listed standard-score intervals ranging from 10 to 
90. 

Opposite C are listed the per cents of the population likely to re- 
ceive standard scores below the specified points. For example, ap- 
proximately 2 per cent receive standard scores of less than 30, and 
98 per cent receive standard scores of less than 70. 

Note: The explanations above are approximate and are based upon 
the assumption of large or representative populations yielding distri- 
butions of scores approximating the normal distribution. 


The per cent of the population likely to score below given 
standard scores is indicated by the numerals opposite C. For 
example, two persons out of a hundred are likely to score 
below standard score 30, and sixteen out of a hundred below 
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standard score 40. Only 2 per cent of the population exceeds 
standard score 70. The percentages employed here are based 

on the theoretical normal curve. 

The uses and advantages of standard scores may be sum- 
marized as follows: 

1. Standard scores represent equal units of measurement 
and hence facilitate comparisons regardless of the area o e 


distribution of scores under consideration. 

2. Standard scores from different tests are comparable to 
the extent that they may be averaged or combined (if t e as- 
sumption of normality seems to be warranted). This applies 
even though the tests may contain different numbers of items 

and one test may be more difficult than the other. 

3. Standard scores are based essentially on score points o 
the test rather than on rank order. This facilitates .nterpreta- 
tions in terms of ability or achievement as represented by test 
score and indicates relative standing in the group at the sam 

tm 4.’ Mathematically, standard scores have other values- The 
zero point is always the mean (standard score 50 as described 
here). The mean is the most stable measure of central tmd- 
ency, that is, the most stable central score. The range _be ween 
standard scores of 40 and 50 represents one standard dev: at ° 
below the mean (or average). The standard deviation is the 
most reliable measure of variability within the group 
has many statistical uses. (See Figure 7.) 


NORMS AND THE TEST MANUAL 

Before giving a test, the teacher should read carefully the 
directions for administering it and the descriptions o norms 
presented in the manual of directions. It is impo a 
teacher follow these directions, since the norms have been 
developed on the basis of these specific regulations 
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quirements and any deviation from the required procedure 
may invalidate them. Pupil scores can be interpreted on the 
basis of the norms provided only when the test has been ad- 
ministered under the standard conditions set forth in the 
manual. The directions for scoring the test must be care- 
fully followed, since deviations from the standard methods of 
scoring also will invalidate the norms. 


The test manual will also include a description of the types 
of norms provided, and the teacher should study these descrip- 
tions carefully in order to interpret the norms correctly. For 


example, we have defined percentile rank as a point of separa- 
tion which marks off one proportion of the group from an- 
other. In other sources the teacher may read that a pupil’s 
percentile rank indicates the per cent of the pupils in the 
group that he equals or excels in score on the given test . 2 In- 
terpretation of pupil scores will vary with the definition of 
norms, and it is important that the teacher study the definition 
given in the manual of directions for the particular test he is 
giving. Only on this basis are accurate interpretations possible. 

In addition, the teacher must examine the test manual to 
iscover the nature of the sample population on which the 
norms were based, for they will be of little value for use with 
pupils who differ markedly from the normative groups. For 
latedo^ i Ch dren ° f migrator y WOf kers and children in iso- 

hl nol T 35 are ^ likdy t0 com P are ^vorably with 
the norms usually presented with educational tests. 


SUMMARY 

evaluation is 

“ d ~ 

BrottS“ I9 E 4 S yp?’?82. °> New York: Harper & 
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development of norms has facilitated educational evaluation 
to the extent that they give the teacher a basis for companso 
of the pupil with others of specified ages or grades The pur- 
pose of norms is to place scores on tests in a rework 
which helps the teacher to make comparisons or to relate 

scores to one another. , • j 

Norms are derived from a study of the test scores 
by large groups of pupils of specified age or grade ' 

They are measures which relate the score o one P U P 
scores of others. Among the kinds of norms ^^d “r 
ployed are age, grade, and percentile norms, and standard, 

T A C g°e r norms relate the 

cally attained by pupils of a speci age> and 

, have been developed for mental g , 

reading age. Mental age is derived from scores on tes s o 
mental maturity, mental ability, or mtelhgen - Edusat ona 
age relates the 

STb^on^Sve attainment in skills and under- 
typical 

specified grade levels. « in 

of aptitude or attainment, th y basic aca demic 

connection with tests o£ educationa l achievement, 

fields or as a measure of g determin e the level of 

Grade norms help the teac crhnol grade, 

achievement of the pupil ^ ^““^erelative standing of 
Percentile norms or ranks or ade dassifica- 

the pupil in a defined g™“P s “ c whiah have n o reference 
tion. They are rank-order mea in a ser j es 

to size of score except as this ;dely used , and they 

of scores. These norms are h ^ pupirs sland - 

are meaningful and helpful, as y 
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ing among stated groups with reference to the measured char- 
acteristics. 

Standard scores represent equal units of measurement, have 
a wide range of application, and facilitate accurate interpreta- 
tions. They relate a pupil’s score to scores attained by specific 
groups, such as age or grade groups. 

The teacher who plans to utilize the norms presented in 
test manuals should adhere strictly to the directions for ad- 
ministering and scoring the test, as the norms are derived on 
the basis of these standard procedures. He should also study 
the definitions, methods of derivation, and interpretation of 
the norms in order to evaluate and interpret them and to un- 
derstand the implications of certain variations in definition 
and procedure which may apply to the specific test he is using. 


STUDY AND DISCUSSION EXERCISES 

1. Define the term norm. Indicate the essential difference be- 
tween raw scores and norms. 

tearhe 0 wa 7 s ma y r ^e concept of norms be misused by 
teachers in the evaluation of pupils in the classroom situation? 

to a aronn ?! a mental age of ten years two months according 
necestrv^J * mental . abilit >'- What additional data would be 
your reasons^ 3 eqU ?* e ‘ mcr P r ctation of this derived score? Give 
of the test result. C ° nS ‘ er “ 8 eaCh item essenlial to interpretation 

per 4 c'emile St nomf Zed achievemem provides age, grade, and 
advantages and baSB , for mter P re ’ation of results. Cite the 
5. MaS a fifth 3 7 °? , CaCh of »“ presented, 
arithmetic section of 8 ?? st 7’j- aChieVes a score of 65 on the 
is equivalent to percemiir70 3 How d aCbl , eVement ,Est ' This score 
m terms of the available data? WOUW y ° U lnlcr P ret this score 

norms. What are'thfpa'rtlcular Idva" , S,and 7 score and Percentile 

7. Differentiate adva ntages of each? 

8. Johnny has taken ''th 0 ™?!?? 11 standards of achievement. 

y taken three d.fferent standardized general- 
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achievement batteries during the past few weeks. His scores, when 
converted to grade norms, differ. On battery A his achievement 
is at grade level 4.6, in terms of the norms for the test. On battery 
B his achievement places him at grade level 5.7. He scores at 
grade level 6.1 on battery C. Assuming that scores on all three 
batteries ordinarily have a high degree of reliability, try to account 
for the differences in performance. 

SUGGESTED ADDITIONAL READINGS 

Freeman, F. S.: Theory and Practice of Psychological Testing, 
New York: Henry Holt and Company, Inc., 1950. 

Chapter I presents an overview of the problem of psychological 
measurement. 

Froelich, Clifford P., and Arthur L. Benson: Guidance Testing, 
Chicago: Science Research Associates, Inc., 1948. 

Chapter 3 includes a discussion of the values and limitations 
of norms. 

Greene, Edward B.: Measurements of Human Behavior, New 
York: The Odyssey Press, Inc., 1952. 

Chapter 12 presents a discussion of the interpretation of test 
scores. 

Micheels, William J., and M. Ray Karnes: Measuring Educational 
Achievement, New York: McGraw-Hill Book Company, Inc., 
1950. 

Chapter 1 contains an excellent discussion of the nature of 
measurement with particular reference to indirect and relative 
measures. 

Monroe, Walter S. (ed.): Encyclopedia of Educational Research, 
New York: The Macmillan Company, 1950, pp. 785-802. 

A discussion of the types, comparability, and interpretations of 
norms. 



CHAPTER FIVE 


Estimating Capacity for Learning 


Among the teacher’s more urgent problems is that of estimat- 
ing capacity for learning — of discovering what scholastic per- 
formance to expect of pupils. Is Jim learning as rapidly and 
as well as can be expected? Is the classroom work too easy or 
too difficult for Mary? What kinds of materials are best suited 
to Don’s abilities? Is Jane’s poor work in school due to lack 
of ability or to some other factor? These are among the every- 
day problems of the classroom teacher. 

It is as important to discover the pupil’s capacity for learn- 
ing as to determine what he is learning. It is widely recog- 
nized that pupils differ markedly in ability to succeed in 
schoolwork and that therefore uniform standards of achieve- 
ment for all are unrealistic and undesirable. No teacher ex- 
pects all children to conform to a given standard in height or 
weight; and certainly every teacher recognizes that forced 
feeding, stretching, pulling, pushing, or any other kind of pres- 
sure would be entirely useless in adjusting the pupil’s height 
to classroom standards. Although it is sometimes less obvious, 
educational “pressure” techniques are no more likely to equal- 
ize the learning capacity of pupils. 

Recognition of the existence of individual differences im- 
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plies acceptance of these differences as educational facts 
which form the background for the teacher’s work. Indeed, a 
major function of testing is to enable teachers to understand 
and work with the differences that inevitably exist, such as 
Johnny’s not learning so readily as Billy. The only satisfactory 
educational practice is to exert every possible effort to adapt 
the curriculum to the educational potentialities of individual 
pupils. 

Adapting the curriculum to the learning potentials of in- 
dividual pupils means careful study by the teacher of the ap- 
titudes of his pupils for schoolvvork. By utilizing the evidence 
from tests of intelligence — or tests of scholastic or educa- 
tional aptitude, as they might more properly be called — the 
teacher gathers objective data upon which to base judgments 
and expectations. Tests do not give answers; they provide 
data that are valuable to the teacher to the extent that he uses 
them wisely in making judgments. 

WHAT INTELLIGENCE TESTS MEASURE 

The definition of intelligence is subject to a great deal of 
controversy involving a number of points of view. Since our 
purpose here is to help the teacher use test results effectively, 
we shall limit our discussion to the kinds of abilities that are 
commonly evaluated by so-called “intelligence tests.” The 
teacher is urged to examine the tests he uses, read the man- 
uals carefully, and decide what specific abilities are tested by 
the material. An intelligence-test score, like any other test 
score, is significant only for the materials included in the par- 
ticular test, and the teacher’s interpretation of the score must 
first of all be related to the kinds of performances required of 
pupils by the test. 

Items commonly included in intelligence tests arc designed 
to sample abilities such as the following: 
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1. Memory — immediate or delayed, meaningful or rote. 

2. Ability to deal with verbal materials (vocabulary). 

3. Ability to deal with spatial relationships or to orient the 
self in space. 

4. Ability to deal with verbal relationships (analogies, op- 
posites). 

5. Ability to deal with numerical materials either as sheer 
facility with numbers or as ability to reason numerically 
or quantitatively. 

6. Ability to find the guiding principle involved in tasks 
which may be verbal, numerical, spatial, or pictorial in 
nature. 

7. Ability to perceive essential details, make fine distinc- 
tions, and notice similarities. 

Tests of mental ability may include material on as few as 
three or four of the aspects of intelligence outlined above or 
on all or almost all of them. One test may deal primarily with 
verbal materials; another may require manipulation of blocks, 
the solution of puzzles, or the interpretation of pictures. For 
example, vocabulary may be tested by items such as the fol- 
lowing, which differ in the emphasis placed upon language 
skills: 

Words: a vocabulary test of the type which requires some 
reading skill. The pupil selects the one word, A, B, C, or D, 
which has the same meaning as the initial word. 

1 . big; A fair B windy C soft D large 
Pictures: a test of vocabulary which does not require read- 
ing skill. The teacher pronounces the word dog. The pupil 
finds and indicates the picture which corresponds to the 
spoken word. 1 

'L. L. Thurstonc and T. G. Thurstone, S.R.A. Primary Mental 
Abilities, Elementary Form AH, Chicago: Science Research Asso- 
ciates. Inc., 1948. 



Again, reasoning ability may be evaluated by items which 
differ in the degree of emphasis upon verbal skills. Of the fol- 
lowing, the word-grouping item utilizes verbal materials, 
whereas in the figure-grouping item language skill does not 
play an essential part. 

Word grouping: The pupil is required to indicate the word 
which does not belong with the others. 

A red B blue C heavy D green 

Figure grouping: The pupil is asked to indicate the figure 
which does not belong with the group.' 

Figure -grouping 

A B C D 

[ □ i=r 

Items like the following, in which the pupil selects the ap- 
propriate analogy, may be of the language or nonlanguage 
type: 3 


Proctlcc E»ercise 3. Ob to O os □ is to ‘ > . 2 CUD , 3 □ , 4 \S , 5 D 

E0000 

3 Ibid. 

*V. A. C. Hcnmon and M. J. Nelson, The Ilcnmott-Nehon Tests 
of Mental Ability, Form B (Grades 7-12), Boston: Houghton Mifflin 
Company, 1932. 
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63: Land is to peninsula as ocean is to: 1 gulf, 2 lake, 3 cape, 

4 river, 5 island. 

The particular emphasis of the test items will be reflected 
in the results of the tests for each pupil. Jimmy, for instance, 
has little facility with words and will not do very well on a 
test which is heavily loaded with verbal items. On the other 
hand, he may shine on a test which requires manipulation of 
materials and the solution of problems involving concrete ob- 
jects or pictorial situations. But the fact that the items in the 
verbal test do not tap his best abilities does not mean that it 
is useless for Jimmy to take the test. The test will indicate 
Jimmy’s weakness in verbal areas, and this is valuable in- 
formation for the teacher, whether it is news or merely a con- 
firmation of opinion. 

The teacher who is interpreting Jimmy’s test results, how- 
ever, must keep in mind that a verbal test represents a 
sampling of only a limited number of abilities. The teacher 
will recognize the significance of more inclusive measures. 
The best way to promote Jimmy’s development in verba] 
skills and meanings may be through the areas in which he 
demonstrates greatest capacity for learning. When this is the 
case, the teacher should seek a test or a battery of tests which 
gives a more inclusive picture of the pupil’s capacities. 

It is particularly important that a test which samples vari- 
ous areas of intelligence rather than probing “general” intel- 
ligence be used for adolescent students. Recent investigations 
have indicated that specific abilities become more sharply dif- 
ferentiated as the individual develops toward maturity. In sta- 
tistical terms, it has been found that intercorrelations between 
such characteristics as memory, verbal abilities, and number 
facility are not so extensive in fifteen-year-olds as in twelve- 
year-olds. The correlations from these various studies indi- 
cate clearly that different aspects of intelligence are being 
measured and that independence of mental traits, or differen- 
tiation among traits, increases with age through the adoles- 
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cent years.”* Two practical implications may be derived from 
these data. ( 1 ) In studying adolescents, tests of general men- 
tal ability will be less informative than tests that provide a 
profile of abilities. (2) As the child grows, teachers find more 
opportunities to capitalize upon and thus promote the devel- 
opment of the specific abilities as these become more clearly 
differentiated. 

It is well, also, to understand that there are aspects of ad- 
justment and functioning that arc not measured by intelli- 
gence tests. For example, the pupil’s determination to use 
what intelligence he has to best advantage is not measured. 
Health conditions and drive may have a deleterious or invig- 
orating effect upon intellectual functioning. The paucity or 
richness of experience conditions the development and func- 
tioning of intellectual capacity. Experiential background is 
not likely to be indicated. Social intelligence — ability to get 
along with others without excessive emotional tension — has 
not yet been isolated as an aspect of intelligence, yet it does 
affect schoolwork and personal adjustment. 

THE MEANING OF MENTAL AGE 

Most intelligence tests utilize the concept of mental age 
(MA) as a basis for the interpretation of results. The child’s 
test score is related to the age group of which his score is 
typical. For example, Betty has a raw score of 107 on a men- 
tal test. The table of norms indicates that this score is equiva- 
lent to a mental age of ten years two months. This means that 
Betty’s score on this test is the score typically attained by 
children who are ten years two months of age. Betty herself 
may be eight or twelve years old chronologically. Her score, 
however, is more typical of the age group ten years two 


’David Segal, Intellectual Abilities in the Adolescent Period, Fed- 
eral Security Agency, Office of Education, Bulletin 1948, no. 6, p. 10. 
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months than of either eight- or twelve-year-olds. Betty, in 
other words, achieves at the ten-year two-month norm on this 
test and is said to have an MA of ten years two months. 

In certain respects the concept of mental age is of more 
value to the teacher than that of IQ, which is, however, more 
commonly used. Mental age approximates the use of grade 
norms on achievement tests; that is, it relates Betty to a spe- 
cific age group in mental ability. This may help the teacher 
to develop suitable expectations for Betty with reference to 
schoolwork. 

The teacher may be able, on the basis of his knowledge of 
children at various age levels, to decide upon materials which 
will be suitable for Betty, regardless of her actual grade place- 
ment. Suppose that she is twelve years old and has a mental 
age of ten years. This does not imply that she is just like chil- 
dren ten years of age; in interests, experiences, social develop- 
ment, physique, and possibly in many respects mentally, she 
will differ from “typical” ten-year-olds. However, in terms of 
the mental abilities sampled by the test, it appears that she 
resembles the ten-year-old group in mentality. The index is 
admittedly rough, but it is perhaps the best available basis for 
the teacher’s dealings with Betty, and it suggests some possible 
approaches to Betty’s learning problems. 

Many teachers tend to think in terms of grade groups 
rather than age groups. Possibly this is in part the reason for 
the popularity of grade norms as a basis for thinking about 
achievement-test results. Mental ages can be converted into 
mental grade placements. For example, Betty’s mental age re- 
lates her, in terms of the mental-test sampling, to the ten- 
year-old group, and youngsters of this age are typically found 
in the fifth grade in school. Betty functions mentally about at 
the level, then, of typical fifth graders. This information gives 
the teacher a worthwhile starting point for his attempts to 
adapt the curriculum to Betty’s potentialities. 
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Betty’s mental grade placement does not necessarily indi- 
cate a fifth-grade program for her, since she may differ in 
many respects from typical ten-year-olds. Although the cur- 
riculum which suits her best will probably involve a level of 
mental functioning comparable to that expected of ten-year- 
olds, differences in experience and level of social develop- 
ment and the fact that she has been in school and has had 
previous contact with a wide variety of curricular materials 
will influence the teacher in choosing the learning materials 
and processes suited to Betty’s needs and capacities. This de- 
cision concerning Betty’s program illustrates the principle we 
have already discussed — that test data should be used to sup- 
plement, correct, or confirm the observations made by the 
teacher. The teacher must then look for evidences of interest, 
boredom, nervous tension, or enjoyment of suggested tasks 
to see how well the materials tentatively being tried are suited 
to Betty’s needs. 

The value of the mental-age concept is further illustrated by 
the following approximate equivalents: 

Case A: The IQ of pupil A is 100. His chronological age 
upon entering school was five years six months; because of his 
birth date, he was able to enter school at a relatively early age. 
His MA (five years six months) indicates that very possibly 
he will not be ready to begin reading during his first year in 
school. 

Case B: The IQ of pupil B is 90. He was delayed in enter- 
ing school for one year because of illness. His chronological 
age is seven years two months; his MA is six years five 
months. Under ordinary circumstances pupil B is likely to be 
ready for reading even though his IQ is lower than that of 
pupil A, since mental age is more closely related to reading 
readiness than is IQ. 

Case C: Pupil C has an IQ of 126. His chronological age is 
six years four months; his MA is eight years. Pupil C is likely 
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to have little difficulty with the first-grade program in reading 
and may need more challenging materials than his classmates 
require. 

The interpretations in these three cases above are clarified 
through the use of mental-age rather than IQ concepts. How- 
ever, it must be repeated that many factors aside from tested 
mental ability enter into a child’s readiness for learning. The 
possibilities suggested above should be regarded by the 
teacher as tentative hypotheses that may have to be changed 
as a result of other circumstances in the child’s classroom life. 

Again, the teacher must evaluate test materials carefully. 
Mental age is an average based upon a particular sampling of 
mental abilities; it may cut across few or many abilities, and 
it must be interpreted in terms of the test from which it is de- 
rived and the kinds of problems the test requires the pupil to 
solve. 


THE MEANING OF IQ 


The term intelligence quotient (IQ) is firmly established in 
the vocabulary of teachers. The IQ is the ratio of a pupil’s 
mental age to his chronological age and is found as follows: 


IQ = x 100 


In order to compute the IQ of a pupil, the MA is derived 
from an intelligence test and converted into months. This 
term is divided by the chronological age (CA), also con- 
verted into months. The result of this computation is then 
multiplied by 100, thus clearing two decimal places. 

By way of example, let us suppose that Billy, whose chron- 
ological age is ten years four months, has completed an intel- 
ligence test which shows him to have a mental age of eleven 
years one month. His IQ can be calculated as follows: 


CA = 10-4, or 124 months 
MA = 11-1, or 133 months 
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Using the formula given above, Billy’s IQ is 

X 100 = X 100, or 107 

It is evident that the IQ is a ratio of level of mental devel- 
opment to chronological age. Hence it indicates the present 
rate of mental development or the relative brightness of the 
individual. Billy, in the example above, is developing men- 
tally at a rate slightly faster than that of the hypothetical aver- 
age child, as indicated by his IQ of 107. The child whose 
mental and chronological ages are exactly equal has an IQ of 
100. Actually, the normal rate of development or average 
brightness might possibly be best defined as the IQ range be- 
tween 90 and 110. 

Some Cautions. As we have seen, the IQ represents rate of 
mental development or relative brightness. The teacher should 
remember, however, that basically it is derived from a test 
score and is subject to the usual limitations which pertain to 
such scores. No test is completely reliable; on similar test ma- 
terials a pupil may perform better at one time than at another, 
since he may feel better or have a better attitude toward the 
test at one time than at another. When he takes the test, he 
may be at his best with reference to some types of materials 
and at his worst with reference to others. Thus the test at best 
can only sample his abilities; it cannot give complete cover- 
age. All this indicates that the teacher should employ at least 
as much caution in the interpretation of the IQ as in the in- 
terpretation of any other test score. In fact, because of the 
particular implications which have become attached to the 
concept of IQ, the need for cautious interpretation can 
scarcely be overemphasized. 

Many persons feel that the IQ should be regarded as con- 
stant and that therefore one test result, even though it is sev- 
eral years old, should be sufficient evidence of the brightness 
of a pupil. This is far from the truth. The IQ is only roughly 
constant; although it does not ordinarily fluctuate widely 
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from one test to another, it nevertheless does vary, and dif- 
ferences of at least ten points on the usual tests can be ex- 
pected. Still larger fluctuations occur, even when similar tests 
are used. When tests are administered at intervals of several 
years, marked deviations are not at all uncommon. 5 In fact, 
the results of intelligence tests administered during the pre- 
school or primary-grade period should probably be given lit- 
tle credence if they are over a year old. Although test results 
for older pupils appear to be more stable it is doubtful 
whether they are really dependable over any lengthy period 
of time. 6 

IQs derived from different intelligence tests are not com- 
parable. The fact that two tests are labeled “mental” or “in- 
telligence” tests does not mean that the tasks included in them 
arc at all alike. A child may have an IQ of 107 on one test, 
95 on another, and 115 on a third. Hence, to interpret any 
IQ the teacher should know the name and nature of the test 
from which it was derived. This is one of the reasons why it is 
desirable to use the same test with comparable forms over a 
number of years. This practice will tend to eliminate com- 
parisons between measures that are not necessarily equiva- 
lent. Estimating the capacity for learning of individual pupils 
can be facilitated by the informed use of tests; but the estima- 
tion is not accomplished by a single test score — it is the result 
of properly correlating many data. 

It is a recognized fact, too, that children grow at different 
rates and that these rates vary within individuals. This is as 
true with mental growth as it is with any other aspect of 
growth. These variations which may be due to a variety of 

John P. Zubek and P. A. Solberg, Human Development, New 
York: McGraw-Hill Book Company, Inc., 1954, p. 282. 

*J. E. Anderson, “The Limitations of Infant and Preschool Tests 
in the Measurement of Intelligence,” Journal of Psychology, 8:351- 
379, 1939. K,. P. Bradway, “IQ Constancy on the Revised Stanford- 
Binet from the Preschool to the Junior High School Level,” Journal 
of Genetic Psychology, 65:197-217, 1944. 
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factors, innate and environmental, result in sufficient varia- 
tion in the IQ that the teacher must exercise care when apply- 
ing interpretations to individual cases. 

Environmental Influences 

It is commonly assumed that intelligence is inherited. Al- 
though there is evidence that tends to support this point of 
view, it has been demonstrated that environmental conditions 
in the home and community, school attendance, and other fac- 
tors influence IQ. Consequently it is perhaps wisest to con- 
sider that heredity sets limits to individual potentialities, but 
that these potentialities develop in response to environmental 
stimulation. 

In judging a pupil’s ability to learn, the teacher must bear 
in mind the fact that his opportunities to develop his abilities, 
at least along scholastic lines, may have been very limited. 
Given more stimulating environmental conditions, his meas- 
ured ability might change considerably. For example, 
Wheeler' has shown that the general test level of a mountain 
community improved over a period of ten years during which 
improvements occurred in the general environment. Wellman 
and Pegram 5 and Skodak and Skeels" have shown that differ- 
ences in environmental conditions are related to changes in 
tested intelligence. 

The kind of social and economic environment from which 
a child comes influences his tested intelligence markedly . 10 

1 L. R. Wheeler, “A Comparative Study of the Intelligence of East 
Tennessee Mountain Children,” Journal of Educational Psychology, 
33:321-334, 1942. 

‘Beth L. Wellman and E. L. Pegram, “Binet IQ Changes of Or- 
phanage Preschool Children: A Reanalysis,” Journal of Genetic Psy- 
chology, 65,239-263, 1944. 

’ M. Skodak and H. M. Skeels, “A Follow-up Study of Children in 
Adoptive Homes," Journal of Genetic Psychology, 66:21-58, 1945. 

"Kenneth Eells, A. Davis, R. J. Havinghurst, V. E. Herrick, and 
R. Tyler, Intelligence and Cultural Differences, Chicago: University of 
Chicago Press, 1951. 
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Most intelligence tests tend to favor middle-class and upper- 
class children and to discriminate against lower-class chil- 
dren; that is, the test items are concerned with materials more 
familiar to one group of children than the other. Verbal items 
most markedly favor middle-class as against lower-class chil- 
dren, whereas nonverbal and pictorial materials discriminate 
least against lower-class children. This being the case, the 
teacher must consider the child’s background in evaluating 
test scores. Although test results may indicate the child’s abil- 
ity to deal with a standard school curriculum, they may not in- 
dicate his potential ability to solve problems in areas related 
to his experience. 


PERCENTILE RANKS 

One of the most meaningful interpretations of mental-test 
results is that which compares the pupil with his own age 
group. Mental-age, as we have seen, relates the child to an 
age group with ability similar to his own with regard to the 
sampling of mental tasks included in a specific test. The IQ 
purports to be a more general measure which indicates rela- 
tive brightness. This kind of comparison has its limitations, 
because an IQ of 115 at age eight is indicative of quite a dif- 
ferent level of mental functioning than an IQ of 115 at the 
age of twelve or fourteen. Although the quotients are similar, 
the expectations of the teacher must be related to the age 
group to which the pupil belongs; therefore, a measure which 
compares the child with those of his age is more meaningful. 

Comparisons with peer groups are most commonly given in 
the form of percentile ranks. Percentile norms indicate the 
relative standing of the child in defined age or grade groups; 
thus they provide a comparison of the child with others in a 
defined group. In making use of percentile norms the teacher 
should keep in mind that they represent variable units of 
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measurement, especially when he is tempted to make fine dis- 
tinctions in terms of others of like age. 

Norms based on standard scores (see Chapter 4) provide a 
further means of comparing the individual with members of a 
specified group. Norms of this type represent points along a 
scale, the units of measurement being equivalent throughout 
the length of the scale. Certain IQ measures are based on 
standard scores. They provide a means of comparing the in- 
dividual’s ability with specified age or educational groups. 
Since the basic unit, one standard deviation, is similar in 
meaning for different tests, norms based on standard scores 
provide a precise basis for comparison between tests and be- 
tween individuals. 

USING INTELLIGENCE-TEST RESULTS 

Intelligence tests, as we have seen, are valuable instru- 
ments, but their value is dependent upon competent use and 
interpretation. They are tools which may be used or misused. 
When carefully selected, administered, and interpreted, how- 
ever, they provide the teacher with significant data. 

The fundamental aim of the teacher is to assist each pupil 
to make the best use of his capacities, and the intelligence test 
is perhaps the best general measure of learning capacity with 
reference to schoolwork. This is especially true in the more 
academic areas such as reading, composition, arithmetic, and 
spelling. Mental-age scores permit the teacher to estimate the 
mental level at which a child is likely to work most effectively, 
and through the use of the IQ and percentile rank, the teacher 
is able to formulate expectations regarding the pupil’s capac- 
ity to learn in academic areas. In the case of a new pupil or a 
new class, such information may be vital to the success of the 
program of instruction. 

Intelligence-test results are useful, too, in helping the 
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teacher to diagnose learning disabilities. Is Johnny capable of 
doing better work in reading or arithmetic? Is Mary “just 
plain lazy,” or is the work too difficult or too easy for her? 
Is Ronny rebellious because he is unable to work at the level 
expected of him, or is there some other reason for his atti- 
tude? Educational problems are seldom simple. The intel- 
ligence-test result is ordinarily, therefore, only one of several 
types of evidence that should be examined in reaching a deci- 
sion about a learning problem. However, it is important that 
capacity for learning, as indicated by test results, be given due 
consideration . 

In making more specific diagnoses, the teacher may wish to 
examine the pupil’s ability profile. Today, many tests make 
such profiles available. Such instruments may provide esti- 
mates of ability in areas such as number concepts, arithmetic 
reasoning, and verbal or spatial reasoning. Study of the pu- 
pil’s profile may help the teacher reach a tentative decision 
concerning specific weaknesses which may be interfering with 
his progress. In addition, the profile may indicate strengths 
which may be utilized to encourage more effective learning. 

Intelligence-test results may also be utilized in grouping 
pupils for classroom work. The intelligence-test score is, of 
course, only a rough index of ability to do schoolwork; many 
other factors enter into and influence achievement. Neverthe- 
less, at times the teacher may well wish to consider tested in- 
telligence in grouping his pupils. At other times he may group 
them on some other basis for work on units or projects which 
offer a variety of tasks requiring varying levels and kinds of 
mental ability. In deriding upon the part which the individual 
pupil is to play, the teacher must consider his general-ability 
level and his special aptitudes as well as his interests and 
needs. 

The teacher can derive from intelligence-test results an esti- 
mate of the range of abilities represented by his class. In a 
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class in which the range is small, the teacher’s program may 
be quite different from that required for a class in which the 
pupil group has a wide range of general ability. In other in- 
stances the teacher may find that the general ability of his 
group is rather low or rather high by comparison with norms. 
In either case his expectations for the group will be influenced 
by the results of the tests. Clothing which is too small or too 
large is usually uncomfortable, and teacher expectations 
which do not “fit” the group and the individual are unlikely 
to be comfortable and satisfying for either the teacher or the 
pupil. 


SUMMARY 

Among the more urgent and vital problems of the teacher 
is that of developing suitable expectations for his pupils, both 
as a group and as individuals, in order to adapt the curricu- 
lum to the educational potentialities of individual pupils. 
Tests of mental maturity, intelligence, or scholastic aptitudes, 
as they are variously called, offer objective evidence that helps 
the teacher to make sound judgments about his pupils. 

Intelligence tests, to use the customary label, may give a 
single measure, called general intelligence, or may provide a 
profile of a series of abilities. In either case the teacher’s in- 
terpretation of the test results must be based upon a knowl- 
edge of the kinds of materials included in the test and upon 
the kinds of mental abilities which these materials sample. 

In order to give meaning to test scores, results are con- 
verted into mental ages, intelligence quotients, percentile 
ranks, or standard scores. Each of these derived measures has 
its advantages and its limitations. The mental age relates the 
child to an age group in mental ability. The intelligence quo- 
tient is a measure of relative rate of mental development, or 
of relative brightness, and represents a comparison with the 
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population on which the test was standardized. Percentile 
ranks make possible a comparison of the child with others in 
defined age groups, ordinarily his own age group. Difficulty in 
the interpretation of percentile ranks derives from the fact 
that they represent unequal units of measurement. This dif- 
ficulty is overcome when standard scores are used, and the 
advantage of indicating status with reference to a defined age 
group is preserved. In using any of these mental-test “norms,” 
the teacher must interpret the results in terms of the test ma- 
terials, the sampling of mental abilities represented, and the 
specific conditions of testing. 

Intelligence-test results enable the teacher to help his pupils 
make the best use of their capacities. In adapting the school 
program to individual differences, mental-test results can be 
used to promote sound diagnosis of general or specific learning 
problems, effective grouping of pupils for various purposes in 
the classroom, and judicious gearing of expectations to the 
abilities of the group and the individual. However, tests are 
tools which may be used wisely or carelessly and ineffectively, 
for the value of mental-test results in classroom use is deter- 
mined by the wisdom of the teacher’s interpretations, judg- 
ments, and applications. 


STUDY AND DISCUSSION EXERCISES 

1. From your reading and from a study of group mental tests, 
describe the bases upon which you would select a test for use with 
your classroom group at a specified grade level. 

2. Explain: “The results of a test of intelligence which empha- 
sizes verbal abilities may not be completely valid for all pupils 
in a classroom group.” 

3. In some schools, ability grouping on the basis of mental-test 

results is used as a means of facilitating instruction. Discuss the 
values and limitations of mental tests as a means of grouping 
pupils for instruction. 1 



ESTIMATING CAPACITY FOR LEARNING 

4. The relationship between IQ and school achievement is 
positive but not strong. List a number of reasons for this apparent 
discrepancy between ability and achievement. 

5. Evaluate the concepts of MA and IQ as partial bases for. 

a. deciding on the optimum grade placement of a new pupil 
at the elementary school level. 

b. estimating the probable readiness for beginning reading. 

c. estimating an individual’s chances of success in a college 
preparatory program in high school. 

6. Mr. S., principal of an elementary school, insists that records 
of intelligence-test results in the cumulative records inc u e t e 
name and form of the test and the date of testing. Miss . o jec s 
to this requirement, maintaining that the MA or IQ is a l at is 
necessary. Which point of view would you support? What are your 

reasons? . , . . 

7. Select a test of general mental ability. Analyze the maten 
included in the test and describe the aspects of intelligence upon 
which the test results are based. 
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CHAPTER SIX 


Evaluating Pupil Achievement 


The assessment of achievement has long been considered a 
primary responsibility of the school and the classroom teacher. 
Traditionally assessment has been concerned with the pupil’s 
accumulation of knowledges and skills in such areas as read- 
ing, arithmetic, language, geography, and history. However, 
educational goals and values are steadily changing, and con- 
cepts of measurement and evaluation are being revised. Al- 
though teachers are still concerned with the scholastic achieve- 
ments of pupils, evaluation of pupil status and progress is 
influenced by changes in our concepts of the larger goals of 
education. It is being increasingly recognized that academic 
achievements are means to the attainment of educationally 
worthwhile goals rather than ends in themselves. This shift 
in emphasis implies revisions in the nature of measuring in- 
struments, in the purposes of measurement, and in the utiliza- 
tion of the results of measurement. 

THE NATURE OF ACHIEVEMENT 

Aptitude for learning must be distinguished from achieve- 
ment, or the actual learning accomplished. However, in order 
87 
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to measure aptitude for learning, which is commonly identi- 
fied with intelligence, the designer of intelligence tests 
must use an indirect approach which involves learning or 
achievement- Intelligence cannot be measured directly but 
must be inferred from its products or its application to various 
types of materials. Hence, in order to accomplish his purpose, 
the maker of intelligence tests attempts to discover what the 
individual has learned in situations experienced by a vast 
majority of persons. Thus he presumes that all persons have 
had opportunities to learn in these areas and that, in the 
majority of instances, differences in test scores reflect differ- 
ences in aptitude rather than in opportunity for learning. An- 
other possibility open to the test-maker is to develop situations 
that are so completely novel that very few persons are likely 
to have had prior experiences in the area being tested. Hence, 
mental tests assess the individual’s capacity or aptitude only 
by inference from his achievements with respect to very com- 
monplace or very novel materials and situations. In other 
words, the mental test is ordinarily a measure of a very gen- 
eral type of achievement. 

Customarily, the scholastic-achievement test differs from 
the general-aptitude test in that results of scholastic tests are 
considered to be dependent upon the acquisition of specialized 
skills and knowledges, usually as a result of special training. 
For example, the individual is provided with opportunities to 
learn in such fields as reading, writing, music, spelling, and 
arithmetic, and his achievement-test results reflect his attain- 
ments in these areas. Basic to the consideration of achieve- 
ment are the ideas of opportunity to learn and attainment of 
skill or knowledge as a result of learning. 

In an intermediate position, between and overlapping the 
concepts upon which aptitude and achievement tests are 
based, are the so-called readiness tests, which measure the 



89 

EVALUATING PUPIL ACHIEVEMENT 
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jectives. It must be reemphasized that measuring instruments 
are tools whose value depends upon the skill with which they 
are utilized. 

Among the most widely used instruments of evaluation are 
achievement-test batteries designed to provide a general sur- 
vey of skill or knowledge attained in specific academic areas. 
Commonly included in these batteries are tests of reading, 
arithmetic or mathematics, spelling, social science, physical 
science, and language or literature. 

The basic consideration in the selection of a battery of 
achievement tests is the extent to which the results of testing 
will reflect status or progress relative to worthwhile educa- 
tional objectives. The teacher should be familiar with the 
achievement batteries available and should study carefully the 
general aspects of the educational program implied by the 
various tests. 

The well-known test batteries are of two principal types: 
those that emphasize acquisition of fundamental skills, as, for 
example, in reading and computation, and those that emphasize 
acquisition of factual knowledge in content areas. A number 
of widely used achievement-test batteries include tests of both 
skills and knowledge. A test battery emphasizing development 
of basic skills at the elementary school level might include 
tests of work-study skills, reading, language arts, arithmetic 
skills, and spelling. The test items are oriented toward the 
measurement of skills and processes fundamental in the area; 
in arithmetic, for example, items on number concepts, prob- 
lem solving, and the fundamental processes of addition, sub- 
traction, multiplication, and division might be included. 

Achievement batteries designed to measure knowledge of 
content may include tests in the areas of literature, social 
studies, science, and so on. The test items will be based on 
subject matter suited to pupils at the grade level for which 
the test is designed. At the seventh-grade level, for instance, 
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items in social studies might concern the voyages of Colum- 
bus, American history, and the geography of North America 
and possibly of other continents. A “content” item asks for 
information such as: "When did Columbus sail for America?” 
“Where did he land?” Skill items in the area of social studies 
might involve map reading or interpretation of information 
given in the test. 

Some achievement-test batteries provide extensive surveys 
of educational achievement, and others deal more intensively 
with a limited area. The extensive or survey type of test bat- 
tery typically includes a wide variety of measures, but each 
measure is likely to be based on a limited number of items. 
The practical considerations of economy of administration time 
and the cost of the test are essential considerations in the de- 
velopment of a test battery of this type. A survey battery might 
include tests in all or most of the following basic areas of 
learning: reading, arithmetic, spelling, language, study skills, 
social studies, science, and literature. Such extensive sampling 
of educational fields is likely to preclude intensive treatment 
in any one area. However, the survey battery provides valua- 
ble information in a relatively economical fashion. 

Other tests, as we have seen, sample intensively a limited 
curricular area. For example, if the teacher wishes to make 
an intensive study of an area such as reading skills, arithmetic, 
social studies, science, literature, or language, tests of more 
limited range provide opportunities for intensive analysis. 

The teacher will wish to examine the test items in order to 
determine whether the various aspects of achievement are 
adequately sampled. He will also wish to decide whether the 
items require (1) use of acquired skills, (2) use of acquired 
facts or information, (3) information or skills suited to the 
curriculum his pupils have been following, (4) responses 
which will provide data concerning pupil progress toward the 
objectives which he has envisaged for his class room group. 
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In many instances the classroom teacher does not have the 
opportunity to select the tests he would prefer; frequently he 
is provided with test batteries selected by someone else. This 
situation may not be so hopeless as it seems. Study of the tests 
may reveal specific areas or item groups well suited to pro- 
vide basic evaluative data for his special purposes. Other 
aspects of the test may suggest important emphases which the 
teacher has overlooked. 

It is worthwhile to observe the thought processes required 
of the pupil taking the test. In making such an analysis the 
teacher should inquire whether the items require (1) simple 
memorization of facts, (2) direct utilization of simple skills, 
(3) skill in obtaining factual information (as in the case of 
reference skills), (4) application of facts to the solution of 
problems, (5) ability to draw conclusions or to make infer- 
ences, (6) ability to develop generalizations, (7) ability to 
apply general conclusions or principles to specific situations, 
(8) ability to see relationships within given sets of data. 
Although these possibilities may not ordinarily be considered 
to reflect curricula in the usual sense of the term, they do rep- 
resent methods of evaluating achievement relative to signif- 
icant educational goals such as those concerned with the de- 
velopment of skill in the use of higher mental processes. 
Through even a cursory analysis of this type, the teacher 
comes closer to an understanding of the test and the philos- 
ophy which it represents. 

The teacher should look upon the test as a tool which, al- 
though it may not be ideally suited to his task, may yet be 
far more advantageous than no tool at all. By way of analogy, 
one does not say that a tool has no value at all because it is 
merely one of many required to construct a house. The 
teacher would do well to consider the achievement test with 
the same critical eye with which the carpenter or mechanic 
surveys his tools. Which tool among those available is best 
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suited to the task at hand? In the absence of the ideal tool, 
what possibilities are there for adapting those at hand? 

Using Achievement Norms 

The standardized achievement test differs from the informal 
classroom test in a number of important respects. During the 
process of standardization, the test items are subjected to 
critical experimentation and examination to determine their 
usefulness as units of measurement. Uniform procedures for 
administering and scoring are developed. Relative values or 
norms are established as a basis for interpretation of the test 
scores. These procedures give the standardized test certain 
special values unobtainable in less formal tests. However, the 
problems involved in standardization and large-scale publica- 
tion impose limitations upon the values of such tests. Thus 
both standardized tests and the more flexible, informal tests 
serve their unique purposes in the evaluation program of the 
classroom teacher. 

Grade Norms. The type of norm most widely used with 
achievement tests is perhaps the grade equivalent or grade 
norm. The grade norm, like other types of norms, is a ref- 
erence point which the teacher uses as a basis for the in- 
terpretation of achievement-test scores. It represents neither 
the level of achievement to be expected of an individual child 
nor a standard of achievement for children at a specified grade 
level. The norm is merely the score which is typical of the 
attainment of a sampling of children at a particular grade 
level. 

In order to derive the grade norm, the teacher scores the 
test and accumulates total raw scores for the several areas of 
the test under the appropriate headings, for example, reading, 
arithmetic, and language. The teacher then refers to the set of 
norms provided with the test and records the grade norm 
which each score represents. For the test in Figure 8, the 
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sum of scores in the three areas — that is, the score over the 
entire test — could also be converted to the appropriate grade 
norm. Figure 8 represents the achievement of a fifth-grade 
pupil on our hypothetical achievement-test battery. The first 
column lists the three subtests and a space for the total score 
for the battery. The second column lists the raw or uncon- 
verted scores, the total number of items answered correctly 
in each subsection. John scored 72 points in reading, 58 
points in arithmetic, 69 points in language, and, by summa- 
tion, 199 points over the entire battery of tests. In the third 


The Achievement Test, Form A for Grades 4-7 


Pupil: John 

Age: 

10.6 

Date: 2/3/56 

School: 

Grade: 

5.6 

Teacher: 

Test 

Raw score 

Grade norm 

Percentile norm 

Reading 

72 

6.3 

75 

Arithmetic 

58 

4.0 

20 

Language 

69 

5.5 

50 

Total 

199 

5.4 

47 


Fig. 8. A hypothetical test situation. 


column are the grade norms appropriate to John’s raw scores, 
derived from the table of grade norms which accompanied the 
test. 

In order to interpret these norms, which are representative 
of John’s achievement, we must ask, “What does the figure 
6,3 represent with reference to the reading test?” The refer- 
ence point 6.3 typifies the performance of pupils who have 
spent three months in the sixth grade. Although at the time 
of testing, John has been in the fifth grade for a period of six 
months, his achievement on this test of reading is at a much 
higher level than the typical performance of pupils of his 
grade; it is more typical of sixth than of fifth graders. On 
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the basis of the same type of reasoning, we must conclude that 
John is experiencing some difficulty with arithmetic, or, at 
any rate, with the type of material represented in our achieve- 
ment test. His grade norm in arithmetic is 4.0, which indicates 
that his score on the arithmetic section of the test is typical 
of beginning fourth graders. His scores in language and his 
total scores are fairly typical of pupils at his grade level, but 
the breakdown of the three test areas indicates a field of ac- 
celeration in reading and an area of some difficulty in arith- 
metic. This interpretation is based upon comparison of John’s 
achievement on the test with the typical attainments of pupils 
at various grade levels. Thus grade norms provide the teacher 
with a means of comparing the scores of his pupils with scores 
typical of pupils at specified grade levels. 

Percentile Norms. 1 As we have seen, grade norms make it 
possible to compare scores with the performance typical of 
pupils at various grade levels. Percentile norms, on the other 
hand, indicate the relative standing of a pupil within a speci- 
fied age or grade group. Norms of this type help the teacher 
answer the question, “How well is this pupil achieving by 
comparison with others of his age or grade?” This compari- 
son indicates the standing of the pupil with reference to the 
group which was used as a basis for establishing the norms. 
It has no reference to his status within his own classroom 
group. 

Before assigning the percentile norms for a particular set 
of test results, the teacher should consult the test manual and 
study the definitions,' descriptions, and directions concerning 

’For a general discussion of percentile norms, see Chap. 4, pp. 
53-57. 

* See John C. Flanagan, “Units, Scores, and Norms,” in E. F. Lind- 
quist (ed.). Educational Measurement, Washington: American Coun- 
cil on Education, 1951, pp. 717-719, for a discussion of the problem 
of definition with reference to percentiles. 
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nouns. In general, the teacher will employ the following pro- 
cedures: 

X. Sum the scores to derive the various totals for wincn 
percentile norms are provided in the table of norms. In the 
example of John in Figure 8, these might be the norms for 
reading, arithmetic, and language and the total achievement- 
test score. 

2. From the tables of percentile norms for the age or grade 
level in question, find the norms appropriate to the raw scores 
attained by each pupil. 

3. Utilizing the definition provided in the manual, interpret 
the percentile norms. For example, the manual of the test 
John took provides the following customary definition: “A 
percentile norm indicates the per cent of scores of pupils at 
the specified grade levels which fall below that point.” John’s 
score in reading, for example, is equivalent to percentile 75. 
In terms of our definition, his score is higher than that of 75 
per cent of pupils at his grade level. The teacher will remem- 
ber that this norm refers to the group of fifth-grade pupils 
whose scores on this test formed the basis for the develop- 
ment of this particular set of percentile norms. With this 
concept in mind, we can conclude that John’s test score indi- 
cates a relatively superior achievement in the area of reading 
by comparison with fifth graders generally. Similarly, the per- 
centile norm of 20, indicating John’s relative achievement in 
arithmetic, points to an area of some difficulty, although his 
score is better than that attained by 20 per cent of pupils at 
his grade level. The other scores may be converted and in- 
terpreted in the same way to give us a picture of John’s rela- 
tive achievement status. 

Age Norms. Publishers of standardized tests of educational 
achievement commonly provide age norms, or age equivalents, 
in addition to grade and percentile norms as a basis for the 
interpretation of test scores. The age norm is interpreted in 
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much the same manner as the grade norm, but its reference 
point is the typical performance of the age group rather than 
of the grade group. The age norm derived from general- 
achievement tests is frequently termed educational age, or 
EA. Age norms related to specific subject areas may be pre- 
sented as reading age, arithmetic age, and so on. 

Age equivalents are helpful when the teacher wishes to 
compare the reading, arithmetic, language, or educational age 
of a pupil with his chronological age. They also enable the 
teacher to compare the mental and educational ages of pupils 
in his classroom. However, when two sets of results are com- 
pared — mental and achievement-test results, for example — 
the teacher is cautioned to remember that the norms are de- 
rived from two distinctly different tests which ordinarily have 
been standardized on two different groups of pupils. Again, 
the attainment of a pupil in one subject is likely to differ from 
his attainment in another. In addition, there is evidence that 
intelligence is not a unitary function but is comprised of a 
number of abilities which are not necessarily closely related 
to one another. 3 Thus age norms of different types are not 
so directly comparable as they at first appear to be. 

Standard-score Scales. Although standard scores are fre- 
quently presented as a basis for the interpretation of test re- 
sults, the more common practice is to utilize standard-score 
scales as a basis for the derivation of other types of norms.' 
That is, a standard-score scale is developed from the raw 
scores, and these standard scores are then used to derive age, 
grade, or percentile norms. Such procedures are designed to 
utilize some of the particular advantages of standard scores. 
(See Chapter 4, pp. 57—63.) 

‘L L. Thurstone, Primary Mental Abilities, Psychometric Mono- 
graphs, no. 1, Chicago: University of Chicago Press, 1938. 

4 See, for example, Metropolitan Achievement Tests, Yonkers, N.Y.: 
World Book Company, 1946, 1947. 
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RELATING ABILITY AND ACHIEVEMENT 


Recognition of the fact that pupils of a given grade do not 
all have the same capacity to achieve in any subject area has 
led to efforts to group pupils in such a way as to reduce the 
range of individual differences and to adapt curricular mate- 
rials and teaching methods to the needs and capacities of in- 
dividual pupils. Evaluative procedures have been designed to 
relate capacity and achievement on the assumption that an 
achievement “expectancy” can be developed for the individual 
pupil or for a group of pupils. 
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investigators as summarized by Cook indicates that the extent 
of variation in capacity and achievement would not be 
markedly altered if pupils were classified by chronological 
age rather than by grade level." 

In developing measures of expectancy of achievement, it 
has frequently been assumed that capacity to learn, as meas- 
ured by intelligence tests, is closely related to achievement 
status. Summaries of a number of investigations of the actual 
relationships between measured intelligence and achievement 
reveal that although some relationship is evident,” it is far 
from sufficient to be used as a basis for grouping pupils. 
Furthermore, the degree of relationship between measured 
capacity and achievement varies from one subject-matter area 
to another.” The findings referred to indicate that differences 
between expectancy and achievement are customary. 

In the attempt to answer the question, “Is this pupil work- 
ing at the level of his ability?” the teacher may be tempted 
to relate directly mental-age norms and achievement-age 
norms for the pupils in his classroom. The accomplishment 
quotient (AQ) has been proposed as a single index of this 
relationship. It is derived by dividing the age norm obtained 
from an achievement test (EA) by the age norm obtained 
from a mental test (MA) and multiplying the result by 100. 
The formula for derivation of the AQ is: 

aq - m x 100 

If the mental and educational ages of a pupil are identical, 
his AQ is 100. He would then be considered to be achieving 

“Cook, “The Functions of Measurement in the Facilitation of 
Learning,” in Lindquist (ed.), op. cit., pp. 10-24. See also Quinn 
McNemar, The Revision of the Stanford-Binet Seale, Boston: Hough- 
ton Mifflin Company, 1942, pp. 26-28. 

*J. M. Stephens, Educational Psychology, New York: Henry Holt 
and Company, Inc., 1951, pp. 228-231. 

" Ibid., pp. 228-229. 
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at the level of his capacity. If the MA of a pupil exceeded 
his EA, his AQ would be less than 100 and he would be 
considered to be achieving at a lower level than could be 
expected. If, on the other hand, the EA of a pupil exceeded 
his MA, his IQ would be over 100 and he could be con- 
sidered to be achieving at a higher level than could reason- 
ably be expected. 

For a number of reasons, the use of the AQ in relating 
ability and achievement is not recommended. Among these 
reasons are the following: 

1. It is doubtful whether the relationship between achieve- 
ment in different academic areas is sufficiently high to warrant 
the use of a composite or average index, such as EA or AQ, 
for this type of comparison. 

2. The AQ is unreliable. 

3. EA and MA should be expressed in comparable units 
and in terms of norms from comparable samples if a ratio of 
the two is to be developed. Ordinarily this is not the case. 

4. There is sufficient evidence of variation among aspects 
of intelligence within the individual to make the use of the 
general mental-age index of questionable value in predicting 
achievement in specific subject areas. 

5. Investigations of the existing relationships between in- 
telligence-test results and achievement-test results fail to in- 
dicate the existence of such a direct relationship as that upon 
which the AQ is based. 

6. There is a general tendency for the average AQ of 
pupils of high ability to show underachievement, whereas low- 
ability pupils in general appear to be overachieving according 
to their AQs. 

In comparing measured ability and achievement, the teacher 
should use caution in drawing conclusions from average meas- 
ures such as MA and EA. Obtained differences present cer- 
tain facts with only limited reliability, but these facts may help 
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the teacher frame hypotheses as to the reasons for the dif- 
ference in the case of the individual pupil. By testing these 
hypotheses carefully, the teacher may reach conclusions 
which will give him direction in his work with individuals. 

THE ANALYSIS AND DIAGNOSIS OF 
LEARNING PROBLEMS 

The total score obtained from an achievement-test battery 
gives the teacher some idea of the pupil’s accomplishment. No 
analysis of specific areas of achievement is indicated, the re- 
sult is analogous to the remark of the patient who tells his 
doctor, “I don’t feel well.” Such a statement of fact is of rela- 
tively little value except as an indication of a need for as- 
sistance. The medical practitioner is likely to ask questions 
designed to analyze the situation, such as, “Do you have 
pain?” “Where is the pain located?” “When do you feel this 
way?” “How long have you felt this way?” The answers to 
these questions concerning the patient’s condition may result 
in a specific diagnosis of his problems. 

Similarly, subject scores on an achievement-test battery 
may help the teacher locate and diagnose learning problems. 
An analysis of achievement in the following fundamental 
curricular areas is readily available to the teacher, as indicated 
by the title of the subtests of the Metropolitan Achievement 
Test: (1) reading, (2) vocabulary, (3) arithmetic funda- 
mentals, (4) arithmetic problems, (5) language usage, and 
(6) spelling. 11 

Since norms are commonly provided for each area as 
as for the entire test, such a test makes it possible for 
teacher to analyze the relative strengths and weaknesses rf 
his class and of individual pupils. Although this type of ^ 

» Metropolitan Achievement Tests, Elementary Battery, 

N.Y.: World Book Company, 1948. 
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ysis is very general in nature, it may be useful in indicating 
areas requiring special emphasis. In this way, study of the 
test results can increase the effectiveness of the educational 
program. 

However, detailed diagnostic procedures must be based 
upon a more specific analysis of test results. An analysis of 
the processes required in such areas as arithmetic, language, 
ot Tending indicates that numerous skills are involved. Arith- 
metic fundamentals, for example, involve addition, subtrac- 
tion, multiplication, and division. A test in arithmetic funda- 
mentals subdivided according to these four areas will reveal 
strengths or weaknesses which the teacher should take into 
account. 

Exact diagnosis of learning problems may involve even 
more specific analysis of learning difficulties. For instance, the 
process of subtraction in itself involves a number of skills, 
any of which may represent a point of difficulty for the in- 
dividual pupil. The following list indicates some possibilities 
of analysis within the process of subtraction : 12 simple com- 
binations; borrowing; zeros; subtracting money; subtracting 
numerators; common denominator; whole from mixed num- 
bers; borrowing, mixed numbers; fractions and decimals; 
writing decimals; and denominate numbers. Analysis of test 
results at this level of specificity helps the teacher to determine 
more exactly the particular problems his students are experi- 
encing in the process of subtraction. 

Analysis of the scores from general-achievement batteries, 
therefore, will provide the teacher with information in pro- 
portion to the specificity of the results. However, the number 
of items which involve a specific process must be strictly lim- 
ited in a general-achievement test, and the results for any 
small group of items are likely to be statistically unreliable, 

a E. W. Tiegs and W. W. Clark, California Achievement Tests, 
Intermediate Battery, Los Angeles: California Test Bureau, 1950. 
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although they may be suggestive of possible difficulties. To 
provide opportunity for more comprehensive analysis, stand- 
ardized diagnostic tests or tests especially developed by the 
teacher may be utilized when a general area of difficulty has 
been identified or when intensive analysis of a survey test of 
achievement has indicated a possible source of difficulty. 
When pupils appear to be experiencing difficulty in a specific 
area of arithmetic, as, for example, in multiplication of whole 
numbers, the teacher may administer a test designed for in- 
tensive analysis of this area . 18 Since a test so designed pro- 
vides numerous items related to each skill required in the 
process under study, the results are more clearly indicative 
of the specific difficulty the pupils are experiencing than are 
the results of more generalized tests. 

Although this discussion has been focused upon the use of 
standardized achievement tests, adequate analysis and diag- 
nosis of educational problems ordinarily involves considera- 
tion of the mental abilities, educational background, health 
and physical status, environment, and emotional status of the 
pupil. Analysis of achievement-test results, however, may play 
an important role in educational diagnosis and the establish- 
ment of effective instructional and remedial procedures. The 
value of the tests as tools in this process will reside in the skill 
with which the test results are utilized by the teacher in form- 
ing hypotheses to guide his work with his pupils. 

SUMMARY 

Perhaps the most widely used instruments of educational 
evaluation are achievement-test batteries, which are designed 
to provide a general survey of attainment of academic skills 
and knowledges. For the teacher, the basic consideration in 

11 L. I. Brueckner, Diagnostic Tests and Self-helps in Arithmetic, 
Los Angeles: California Test Bureau, 1955. 
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the selection and utilization of such test batteries is the extent 
to which the results of testing reflect pupil progress relative 
to worthwhile educational goals. 

In selecting and preparing to use a standardized achieve- 
ment test, the teacher should examine the organization of the 
test and the various types of skill, knowledge, and thought 
required of the examinee. This study will help the teacher to 
relate the results to the educational goals which he has formu- 
lated for his pupils. The test is in reality a tool which may be 
useful to varying degrees as an instrument providing evalua- 
tive data. Its value is largely dependent upon the skill with 
which it is used. 

The tables of norms which accompany standardized 
achievement tests provide the teacher with a basis for in- 
terpreting the test results. Grade norms or grade equivalents 
indicate roughly the status of the pupil with reference to scores 
typical of various grade groups. Percentile norms make it pos- 
sible for the teacher to compare the pupil with others of the 
same grade or age who comprised the standardization popu- 
lation. Age norms, or age equivalents, provide a basis for 
comparison of pupil performance with that of various age 
groups. Where standard scores are presented, the teacher is 
provided with scales representing equal units of measurement. 
Such scales are of value because they make possible com- 
parisons among test batteries. 

Although various methods of relating general ability and 
achievement have been proposed, such relationships are gen- 
erally hazardous. Although some relationships exist between 
measured ability and achievement, measured ability to learn 
is not a sufficient base for establishing an expectancy of 
achievement. 

The teacher may conduct analyses and attempt to diagnose 
learning problems at varying levels of intensity. Analysis 
helps the teacher to formulate hypotheses as to possible causes 
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of learning difficulties and provides some basis for instruc- 
tional procedures designed to overcome them. The more spe- 
cific and reliable the analysis, the more likely it is that exact 
sources of learning difficulties will be located. However, diag- 
nosis must ordinarily be based upon observations more com- 
prehensive in nature than the results of an achievement-test 
battery. Intensive study of achievement-test results may, how- 
ever, help the teacher derive maximum benefit from these in- 
struments as aids to instructional planning and procedures. 


STUDY AND DISCUSSION EXERCISES 

1. Suggest some possible reasons why measured ability and 
achievement are not so closely related as might be expected. 

2. Study a general-achievement battery and outline the educa- 
tional philosophy it appears to represent. 

3. Select a subdivision of an achievement-test battery and list 
the detailed skills and knowledges which pupils must possess in 
order to succeed in the selected test area. 

, 4. What do you consider to be the teacher’s responsibilities in 
utilizing standardized tests of achievement in the classroom? 

5. What are some ol the purposes which you as a teacher 
might have in mind when planning to administer a general- 
achievement-test battery? 

6. Analyze the specific skills required of the pupil in perform- 
ing any one of the following functions: (fl) two-column addition; 
(6) multiplication of two-digit numbers; (c) punctuation, includ- 
ing quotations; ( d ) locating significant physical features on a 
map of South America; and (e) locating reference materials in an 
encyclopedia. 


SUGGESTED ADDITIONAL READINGS 

Buros, Oscar K. (ed.): The Fourth Mental Measurements Year- 
book, Highland Park, N.J.: The Gryphon Press, 1953. 

Includes a listing and evaluation of currently available tests ol 
educational achievement, 
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Freeman, F. S.: Theory and Practice of Psychological Testing, 
New York: Henry Holt and Company, Inc., 1950, chap. 11. 

This general discussion of the measurement of educational 
achievement includes a presentation of samples of the contents 
of representative tests in this area. 

Gates, A. I., A. T. Jersild, T. R. McConnell, and R. C. Chall- 
man: Educational Psychology, 3d ed., New York: The Macmillan 
Company, 1948, pp. 543-552, 561-567. 

Presents a general account of the appraisal of pupil progress 
through the use of tests and includes suggestions for approaches 
to educational diagnosis. 

Goodenough, F. L.: Mental Testing, New York: Rinehart & Com- 
pany, Inc., 1949, chap. 22. 

In connection with this chapter the author presents a consid- 
ered account of the problems of relating ability and accom- 
plishment. 

Greene, E. B.: Measurements of Human Behavior, rev. ed., New 
York: The Odyssey Press, Inc., 1952, chap. 7. 

This chapter includes a discussion of the measurement of edu- 
cational achievement and presents examples of measurement 
possibilities in a variety of academic areas. 

Jordan, A. M.: Measurement in Education, New York: McGraw- 
Hill Book Company, Inc., 1953, chaps. 5-13. 

These chapters present a detailed account of methods and in- 
struments for the measurement of attainment in a wide variety 
of subject areas. 



CHAPTER SEVEN 


Estimating Readiness for Learning 


Children differ in their readiness to undertake learning tasks. 
To be effective, instruction must take into account individual 
differences in the abilities, skills, and knowledges which are 
requisite to successful learning experiences. In certain cur- 
ricular areas, many of these abilities, skills, and knowledges 
have been defined and readiness tests have been devised to 
measure them. Such tests provide the teacher with data about 
his pupils which are helpful in planning learning experiences. 


READINESS AND LEARNING 

The term readiness may be applied to any aspect of physi- 
cal, mental, emotional, or experiential maturity which is 
requisite to a learning task. A child learns best when he is 
“ready.” He does not learn, or learns slowly and with difficulty, 
when he lacks the necessary maturity. 

Although the readiness concept has been most commonly 
applied to school beginners, it holds for all grades and age 
levels and for all types of subject matter. For example, it is 
possible to define requisite abilities, skills, and knowledges 
in an area as specific as two-place division at the fifth-grade 
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level 1 or in such general areas as foreign languages, advanced 
mathematics, or science. The prognostic or special-aptitude 
tests designed to predict success in certain of these curricular 
areas are closely related to the readiness tests which are 
utilized with school beginners. 

There appear to be optimal mental ages for learning such 
aspects of arithmetic as addition, addition and subtraction of 
like fractions, and long division . 2 A readiness test has been 
found useful in predicting success or failure in arithmetic . 5 
Mental age has been demonstrated to be related to chances of 
success in beginning reading, and the results of readiness tests 
and rating scales designed to measure reading readiness in- 
dicate likelihood of success or failure in this task . 4 

These experimental results serve to illustrate the fact that 
a detailed understanding of the requirements of the learning 
task, together with an accurate evaluation of the readiness of 
the learner, may be very valuable to the teacher in curricu- 
lum planning. This does not mean, however, that the teacher 
should stand idly by waiting for the children to become 
“ready” for learning. Such skills as arithmetic, reading, and 
writing develop only as a result of learning. Carefully planned 
instruction can often enhance readiness for learning in such 
areas . 5 Evaluation of readiness is possible and readiness pro- 

\V. A. Brownell, “Arithmetical Readiness as a Practical Classroom 
Concept,” Elementary School Journal, 52:15-22, 1951. 

C. W. Washbumc, “Mental Age and the Arithmetic Curriculum,’’ 
Journal of Educational Research, 23:210-231, 1931. See also L. B. 
Ames and F. L. Ilg, “Developmental Trends in Arithmetic,” Journal 
of Genetic Psychology, 79:3-28, 1951. 

* L- I. Brucckncr, “The Development and Validation of an Arith- 
metic Readiness Test,” Journal of Educational Research, 40:496-502, 
1947. 

‘Wiliam Kcttmeyer, “Readiness for Reading” Elementary English, 
24=355-366, 528-535, 1947. 

C. M. Scott, An Evaluation of Training in Readiness Classes,” 
Elementary School Journal, 48:26-32, 1947. 
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grams can be planned in any area of learning in which the 
requisite skills, abilities, and knowledges can be differentiated. 
The manuals of directions for readiness tests frequently offer 
suggestions which may form the basis for such programs. 

READINESS TESTS 

Like other tests, readiness tests may be general or specific 
in nature. In developing a readiness test, the test-maker ana- 
lyzes the learning activities involved in the subject area, at- 
tempting to define the components of the background requisite 
to effective learning. He then designs test scales and items 
which provide estimates of pupil performance in these areas. 
As with other standardized tests, a careful study is made of 
the results of testing, revisions may be made in the original 
test, and final forms of the test are developed. Norms are 
then established and indications of the practical values and 
possibilities are presented in the manual of directions. 

The Metropolitan Readiness Tests may exemplify a gen- 
eral readiness test. It was designed to measure the traits and 
achievements of school beginners that contribute to their 
readiness for first-grade instruction,' and it docs not require 
the ability to read. The tests include measures of comprehen- 
sion of language, including phrases, sentences, and vocab- 
ulary; visual activities involving perception of similarities; 
tests of number knowledge; and a copying test which provides 
a measure of visual perception and motor control. The au- 
thors believe that the results of the test may be of value in 
estimating readiness for reading, arithmetic, and writing. The 
manual of directions presents evidence of the value of the tests 
in predicting first-grade achievement as measured by the Pri- 
mary I nailery of the Metropolitan Achievement Tests. 

* Metropolitan Retulinesr Tear: Directions far Adtnlnlttrtltlon. Von* 
IfM. N.Y.: World Root Company. 19.19, r . I. 



110 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 

Illustrative Readiness Test Items* 

From the Metropolitan Readiness Tests 

Test 3. Information 

9. Mark the thing to travel in across the ocean. 



Test 6. Copying 

1* You arc to copy every picture in this column. 

AL 

• Selected from Gertrude H. Hildreth and Nellie L. Griffiths, 
.Metropolitan Readiness Tests, Form R, Yonkers, N.Y.: World Book 
Company, 1949. (Directions adapted from the manual of directions.) 
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A number of tests specific to reading readiness have been 
devised. The Monroe Reading-aptitude Tests may serve as an 
illustration of this type of instrument. 7 The following tests, 
none of which require reading, are included: 

1. Visual tests designed to detect perceptual reversals and 
to measure ocular-motor control and memory for forms. 

2. Motor tests of speed, steadiness, and writing. 

3. Auditory tests which indicate abilities in word discrim- 
ination, sound blending, and auditory memory. 

4. Articulation tests designed to evaluate speed and level 
of articulation ability. 

5. Language tests which include measures of vocabulary, 
classification, and sentence ability. 

6. Laterality tests which indicate hand, eye, and foot pref- 
erence. 


Administration 

The following rules should be carefully observed in giving 
a test: 

1. The teacher should be thoroughly acquainted with the 
tests and with the detailed instructions and directions given 
in the test manual before attempting to administer the tests. 

2. The test should be given in a quiet room and interrup- 
tions and disturbances should be avoided during testing time. 

3. Small children should be tested in small groups. 

4. It is important to test children at a time when they arc 
not fatigued or overly excited. Pupil attitudes in the testing 
situation are an important consideration in interpreting test 
results. It is therefore necessary to do everything possible to 
ensure a cooperative attitude on the part of the pupils being 
tested. 

5. Where group tests arc used, the testing should be done 

1 Marion Monroe, Reading-aptitude Tests, Boston: Houghton Mif- 
flin Company, 1935. 
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by the children’s own teacher. This is especially important in 
the case of young children, who are sometimes disturbed by 
the presence of unfamiliar persons. 

6. Short testing periods are necessary for young children. 

It is preferable to test over several short periods rather than 
to attempt to administer the entire test at one long sitting. 

7. Pupils should be seated comfortably and in such a man- 
ner that they are not easily disturbed by one another. 

8. In testing small children, the names and other data re- 
quired should be filled in by the teacher before the testing 
begins. 

9. The printed directions for administering the test should 
be followed implicitly, since any marked deviation from these 
instructions is likely to invalidate the results. 

Interpreting the Results 

Norms are provided as the basis for interpretation of the 
results of readiness tests. These norms may be of various 
types. For example, the norms for the Monroe Aptitude Tests 
are presented in the form of percentile ranks. Norms of this 
type are presented for the total test and for each of the five 
scales, which include measures of visual, auditory, motor, ar- 
ticulatory, and language abilities. As a result of experience in 
using the test in connection with other measures, Monroe sug- 
gests interpretation of the results in terms of probable student 
status as superior, average, or retarded. She also proposes 
methods calculated to overcome a variety of difficulties which 
pupils might encounter in beginning reading. 8 

A somewhat different approach to interpretation makes use 
of raw scores, presenting the “probable per cent of failure” 
in terms of the test results.® Still another test manual presents 

Marion Monroe, Manual of Directions, Reading-aptitude Tests, 
Boston: Houghton Mifflin Company, 1935. 

M. M. Lee and \V. \V. Clark, Manual of Directions, Lee-Clark 
Reading-readiness Test, Los Angeles: California Test Bureau, 1943. 
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percentile ranks, letter ratings, and readiness status and offers 
interpretations of letter ratings and readiness status. 10 

As we have seen test scores and norms provide valuable 
information, but interpretations of test results must be based 
upon a recognition of the limitations of the measuring instru- 
ment. Readiness tests provide information covering a limited 
range of abilities and skills. Other factors which enter into 
readiness for a learning task must ordinarily be estimated by 
means of other devices or by observation. Furthermore, since 
it is difficult to get accurate test results with young children, 
the results of readiness tests used with preprimary or primary- 
grade children should be interpreted with due regard not only 
for the limitations of the instrument but also for the difficulties 
of testing young children. 


UTILIZING THE RESULTS OF TESTING 

In actual practice, readiness-test results provide the teacher 
with data that are valuable when used in conjunction with 
other information about the children in his classroom. This 
additional information might well include evaluations of men- 
tal age, emotional and social adjustment, health, vision, hear- 
ing, speech and language development, experiential back- 
ground, ability to solve problems, sense of sequence and re- 
lationships, attention, memory, motor ability, handedness, and 
eyedness. When used in conjunction with such data, readiness- 
test results may be useful as ( 1 ) an aid in estimating the 
readiness of a pupil to do the work included in the area of 
testing; (2) an aid in grouping pupils for instructional pur- 
poses; (3) an aid in analysis of instructional needs of a 
preparatory nature; (4) an aid in estimating the strengths 
and weaknesses of pupils in areas fundamental to the learning 

"G. H. Hildreth and N. L. Griffiths, Metropolitan Readiness Tests, 
Yonkers. N.Y.: World Book Company, 1948. 
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task; (5) a guide to the teacher in adapting instruction to the 
needs of the group and the individual; and (6) a source of 
significant data useful to the teacher in planning work with 
pupils. 

Readiness tests provide at least a partial basis for estimat- 
ing the pupil’s chances of success in a particular area of learn- 
ing, for diagnosing specific weaknesses, and for helping the 
teacher to plan a course of preparatory learning. That is, if 
a pupil achieves a relatively low score on a test of word com- 
prehension, he would benefit from a program of experiences 
designed to increase ability in this area before he begins a 
reading program. A low score on a test of knowledge of num- 
bers or number concepts indicates the need for experiences 
in this area preparatory to studying arithmetic. Low scores on 
tests of vision or hearing may indicate the need for special 
medical examinations or for a program designed to improve 
visual or auditory skills. 

Although many of the skills required by school curricula 
may be developed to some extent by carefully planned instruc- 
tion, readiness is in part a matter of maturation rather than 
of learning, and maturational factors are only in part ame- 
nable to instruction. Hence the teacher may discover that cer- 
tain pupils in his group appear to be unable to profit sig- 
nificantly from a program directed toward preparation for a 
learning task. Certain general skills, however, may be im- 
proved where the activities involved are geared to the child’s 
capacity. The development of such skills may be enhanced 
through activities designed to encourage: 

1. Alert listening. 

2. Ability to follow instructions and directions. 

3. Keenness of observation, as in discrimination of like- 
nesses and differences, perception of form, quantity, 
size, color, and so on. 
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4. Questions, conversations, evaluations, judgments, and 
sharing of experiences. 

5. Development of meanings through varied planned ex- 
periences. 

6. Active attention, recall, and organization of meaning- 
ful materials. 

7. Development of skills of observation and interpretation 
of pictures, events, and materials. 

8. Development of skills in problem solving, planning, and 
construction. 

9. Development of motor skills related to the.various learn- 
ing tasks. 

Harrison 11 has proposed that reading readiness may be in- 
fluenced by instruction which fosters: 

1. Extension of meaningful concepts. 

2. Extension of spoken vocabulary. 

3. Accurate enunciation and pronunciation. 

4. A desire to read. 

5. Correct use of simple sentences. 

6. Ability to do problematic thinking. 

7. Ability to keep a series of events in mind. 

Carter and McGinnis 11 present the following list of activi- 
ties to facilitate reading readiness: 

1. Looking at picture books and telling stories suggested by 
the pictures. 

2. Dramatizing children’s stories. 

3. Telling short stories in response to questions. 

4. Listening to children’s stories and poems. 

5. Telling, in response to questions, what happened in a fa- 
miliar story. 

“ M. Lucile Harrison, Reading Readiness, rev. cd., Boston: Hough- 
ton Mifilin Company, 1939, p. 6, Fig. 1. 

" H. J. L. Carter and D. J. McGinnis, Learning to Read, New 
York: McGraw-Hill Book Company, Inc., 1953, pp. 63-64. By per- 
mission. 
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6. Making scrapbooks of pictures taken on a vacation trip; 
telling group of trip. 

7. Looking through magazines for interesting pictures and 
later telling original stories about them. 

8. Bringing interesting books and phonograph records from 
home to be shared with the group. 

9. Sharing juvenile books with others; understanding that 
books belong to children and that they may be used by 
them. 

10. Telling of one’s own experiences before a group. 

Although these suggestions for preparatory activities have 
reference specifically to reading, they may, in principle, be 
applied to other areas of instruction. For example, activities 
designed to stimulate children’s interest in and develop mean- 
ingful concepts related to arithmetic could include dramati- 
zations, play stories, games, story telling, picture interpreta- 
tions, and manipulation of concrete materials, all involving 
quantities. In the fields of social studies and science meanings 
arc developed in terms of experiences which can be prepara- 
tory as well as directly instructional in nature. 

The teacher, however, will recognize that certain aspects 
of readiness cannot be learned but are dependent upon mat- 
urational characteristics. That is, there is a period in a child’s 
development when it becomes possible for him to undertake 
certain tasks; prior to this time, these tasks may be difficult 
or perhaps impossible. Training in the absence of the required 
maturation is not effective, and a child may be as unprepared 
for some aspects of a readiness program as he is for the learn- 
ing task which is envisaged as the outcome of the training. 
However, school is not necessarily a loss for those children 
who seem to be markedly below the stage of readiness for 
some types of learning; they may be gaining in social skill and 
in familiarity with new surroundings or broadening their ex- 
periences so that when they do become ready for reading and 
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arithmetic they will have somewhat fewer concomitant ad- 
justments to make. 


SUMMARY 

Children differ in many characteristics that influence their 
readiness to undertake the variety of learning tasks involved 
in the school curriculum. In certain areas of learning, partic- 
ularly in the areas of reading and arithmetic, the specific tasks 
have been defined and tests have been devised to measure t e 
extent to which the child possesses the required skills Such 
tests are generally called readiness tests, and most o * em 
have been devised for and used with school beginners The 
concept of readiness, however, does not necessarily app y to 
this age group alone, nor to only one or two areas of learning. 

Ideally, teacher estimates of readiness should include men- 
tal, motor, social, and emotional maturity as well as the p 
cific abilities, skills, and knowledges required for the earning 
task. It has been demonstrated, however, that the resu ts o 
readiness tests are predictive of success in such areas as re ^ 
ing, and that readiness programs designed to P re P“^ 1 c 1 
for the learning task to come do increase the chi s c lan 

of success. , , , , , 

In his selection of a readiness test the teac ler s iou 
guided by the needs of his group. The extent to i w 1C1 
test will enable the teacher to analyze the skills, a 1 1 >es, 
achievements involved in learning the skill wi ® 
portant consideration. The teacher should create t le c " 
sible conditions for administering the test an s iou 
the standard directions implicitly if the results are ° 
terpreted in terms of the norms and classifications mc 
the test manual. Typically, readiness-test manua s P r ° 
teacher with many suggestions for the interpre a i 
utilization of results. 
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If test results are to be used for more than screening pur- 
poses, the teacher may plan a program of learning experi- 
ences designed to develop the skills and knowledges which will 
prepare the child for the more formal instruction to follow. 
But many factors other than specific skills will probably enter 
into the teacher’s consideration in planning such a program, 
since maturational factors may impose a limitation upon the 
effectiveness of specific instruction. 


STUDY AND DISCUSSION EXERCISES 

1. Outline as specifically as you can the skills which you feel 
are involved in one of the following activities: writing, addition of 
simple fractions, map reading, library reference, use of the dic- 
tionary, or some other specific learning task in the area of your 
interest. 

2. What activities are you able to devise which might represent 
a readiness program for the learning task which you have analyzed 
above? 

3. Indicate the characteristics which you would study in esti- 
mating a child’s readiness for reading. Suggest the possible sig- 
nificance of each in relation to the child’s chances of success in 
your reading program. 

4. In what ways might the results of selected reading-readiness 
tests be utilized in developing a readiness program for school be- 
ginners? 

5. Suggest reasons for using mental tests as part of the readi- 
ness battery. If possible, validate your conclusions by reference to 
the literature on readiness. 

6. It has been suggested that teacher ratings are predictive of 
readiness. On what bases would you rate a child as to his possible 
chances of success in beginning arithmetic? 


SUGGESTED ADDITIONAL READING 

Adams, F., L. Gray, and D. Reese: Teaching Children to Read, 
New York: The Ronald Press Company, 1949. 
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Chapters 4, 5, and 6 present a discussion on the nature and de- 
velopment of reading readiness. A reading-readiness check list 
is presented in chap. 4. 

Buros, Oscar K. (ed.): The Fourth Mental Measurements Year- 
book, Highland Park, N.J.; The Gryphon Press, 1953. 

Pages 566-575 present information concerning general readi- 
ness and reading-readiness tests. 

Carter, H. J. L., and D. J. McGinnis: Learning to Read, New 
York: McGraw-Hill Book Company, Inc., 1953. 

Chapter 5 includes a discussion of activities designed to prepare 
children for reading. 

Greene, E. B., Measurements of Human Behavior, rev. ed., New 
York; The Odyssey Press, Inc., 1952, pp. 99-103. 

These pages present a discussion of measures of reading readi- 
ness. 

Greene, H. A., A. N. Jorgensen, and R. Gerberich: Measurement 
and Evaluation in the Elementary School, New York: Longmans, 
Green & Co., Inc., 1942. 

Chapter 15 contains a discussion of readiness tests. 

Harrison, M. Lucile: Reading Readiness, rev. ed., Boston: 
Houghton Mifflin Company, 1939. 

This book deals with the problem of reading readiness, describ- 
ing tests and presenting suggestions for readiness programs. 
Hildreth, G.: Readiness for School Beginners, Yonkers, N. Y.: 
World Book Company, 1950. 

Chapter 3 contains suggestions for exploring general readiness 
of school beginners. Chapter 4 includes a discussion of readi- 
ness tests and their uses. 

Monroe, W. S. (ed.): Encyclopedia of Educational Research, 
New York: The Macmillan Company, 1950. 

Pages 879-880 present a discussion of results of research in 
readiness for reading, spelling, handwriting, and arithmetic. 
Wood, B. D., and R. Hacfncr: Measuring and Guiding Individual 
Growth, Morristown, N.J.: Silver Burdett Company, 1948. 

Readiness and readiness tests arc discussed in chap. 9 of this 
readable text. 



CHAPTER EIGHT 


Appraising Personality 


Educational tests are customarily classified as ability, achieve- 
ment, or personality tests. The teacher must bear in mind, 
however, that fundamentally these different categories rep- 
resent simply different vantage points; the various tests pro- 
vide different views of the pupil. Intelligence or aptitude is 
intimately related to achievement, and both intelligence and 
accomplishment are limited aspects of the total personality 
of the child. In short, all techniques of measurement and eval- 
uation are approaches to the understanding of personality. 
Thus, personality appraisal is treated here in a separate chap- 
ter only for the sake of convenience, for the totality which is 
the child can be fragmented only in textbooks and in aca- 
demic discussion. 

THE MEANING OF PERSONALITY 

Personality is a term which designates the person as he be- 
haves in his characteristic milieu. It embraces what he is, was, 
and can or will be; it is what he hopes to be, loves, hates, 
fears, and is confident of, and how he works and plays. Be- 
120 
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cause of the inclusiveness of the concept, intelligence tests and 
achievement tests are at best approaches to the understand- 
ing of the total personality, and we shall deal here only with 
the facets of personality that are especially important in the 
conduct of school life. 

Many of the significant aspects of personality concern rela- 
tionships with others. Ways of adjusting to others, ways of 
relating oneself to them, ease of communication, and trust or 
distrust of both intimates and strangers are significant aspects 
of personality. To measure this tangible and important facet 
of personality, test makers have devised social-adjustment 
questionnaires and inventories and personality schedules con- 
taining a large proportion of questions that sample interper- 
sonal relations. The popular concept of personality concerns 
this social aspect almost exclusively: the word is commonly 
taken to mean attractiveness to others; a person who is almost 
automatically liked by others is said to have a “wonderful 
personality.” This social phase of personality is indeed im- 
portant; the creative genius must communicate his ideas to 
others, and to the extent that he fails to do this he is pop- 
ularly thought to have a "poor” personality. Similarly, the 
person who can establish easy contacts with others, even if he 
is of limited intelligence, is said to have an effective person- 
ality. Limiting personality to sociability has certain advantages 
where testing is concerned, but, as will be shown in the sec- 
tion entitled “Personality Inventories,” some difficult technical 
problems arise in evaluation. 

The following definition of personality emphasizes sociabil- 
ity: personality is the total pattern of behavior and behavior 
tendencies as they affect others; it is the adjustment of the in- 
dividual as it affects others. Such a definition introduces a dif- 
ficult problem in the evaluation of personality — namely, the 
fact that the teacher’s evaluation of the pupil, or any person’s 
evaluation of another, is dependent upon the perception oj 
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the evaluator. One’s description of another reveals, at least to 
some extent, what one is himself. This is true not only in 
verbal descriptions but also in instruments designed to ap- 
praise personality. Specifically, an instrument for giving an 
indication of adjustment reflects such factors as the interests, 
scholastic background, and experience of the author of the 
device. It is extremely important to keep this in mind as we 
deal with personality appraisal, because it will help us to ex- 
ercise the proper restraint in interpreting the instruments that 
purport to measure so multifaceted a concept as the human 
personality. 

Some specialists define personality as a degree of consist- 
ency — the extent to which a person may be depended upon 
to behave in specific ways in his day-to-day conduct. Others 
refer to this degree of consistency as “character.” The distinc- 
tion is academic, however, because in the larger sense, char- 
acter, like intelligence and achievement, is one of the many 
aspects of personality. Regardless of whether we call it char- 
acter or personality, the element of consistency is important. 
In fact, the whole object of personality appraisal is to predict 
what the individual is likely to do — how he is likely to be- 
have, what situations probably will upset him — so that we 
may help him more effectively. If it were not for this con- 
sistency, there would be little chance of predicting probable 
reactions. Thus the purpose of instruments for appraising per- 
sonality is to determine those consistent elements by asking 
the subject or persons who know the subject what his re- 
sponses to certain situations have been. His future actions are 
predicted on the basis of his answers. 

Personality refers to inner, unobserved motives and pro- 
clivities as well as to external, observable behavior. The im- 
portance of the “inner man,” the “private world” of the indi- 
vidual, is indicated by the fact that a substantial amount of 
mental ill health, or personality disintegration, is justifiably 
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attributed to the individual’s lack of understanding of his 
“true self.” Misunderstanding between persons is often due to 
the difficulty of communication between these inner but basic 
and fundamental selves. Hence, another significant approach 
to the study and understanding of personality is the acquiring 
of clues to the nature of these hidden aspects of attitude and 
conduct. 


Misconceptions in Appraising Personality 

Some obvious misconceptions concerning personality need 
to be briefly mentioned. Many people still fail to recognize 
the fallacy of categorizing or “typing” personalities without 
allowing for “in betweens,” in spite of the conclusive evidence 
of psychology and sociology. The idea that there is a relation- 
ship between hair color and temperament has been found to 
be erroneous, yet one frequently hears personality interpreted 
according to this misconception. The fact that there is no 
connection between personality characteristics and inherent 
racial factors has been demonstrated in psychology and an- 
thropology, but one still hears unenlightened references of 
this type to Negroes, Mexicans, Japanese, and other groups. 
Sometimes character traits are associated with religious dif- 
ferences, i.e., the selfishness of the Jews. Data from careful 
research point to the fact that correlations between race or re- 
ligion and character traits are so low as to render invalid any 
inferences regarding individuals that are derived from such 
generalizations. It is well known that, aside from cultural 
factors, the differences between races are slight. The safe 
conclusion is that differences between races and religious 
groups are much slighter than are the differences within 
them. 

The fact that one cannot judge personality from appear- 
ance remains for many teachers mere academic knowledge, 
for teachers still refer to “bright-looking children” or “ob- 



124 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 
vious dullards.” There are two reasons why teachers should 
avoid appraising personality on the basis of appearance. 
First, there are clinical “types” of mental defect, such as 
Mongolism, cretinism, hydrocephalism, and microcephalism, 
which have recognizable facial and bodily characteristics. But 
these recognizable features are not present inside the range 
of what are considered to be normal individuals. Thus we will 
miss the mark if we use appearance as the criterion when 
working with typical school children. Second, teachers, like 
other persons, tend to read into what they see what they want 
to see; hence, their judgment of personality, even after a pe- 
riod of acquaintance, must be cautious. 

The belief that there is an accepted norm for personality 
development is a misconception. Some educational and psy- 
chological literature creates the impression that extrovertism 
and sociocentric behavior should be the norm — norm in this 
case meaning a standard. Other competent scholars emphasize 
that “it takes all kinds” — that there is a place in society for 
both the extrovert and the introvert and for all those who 
come between the extremes. Some pupils are well adjusted 
even though they are not highly social or outgoing individ- 
uals. In a democracy, it is recognized that different individuals 
make their contributions to the total welfare of society in dif- 
ferent ways. The school might well take the position that the 
development of uniqueness (within limits, of course) is a defi- 
nite responsibility. Hence, as teachers we should not neces- 
sarily encourage all boys to be athletes, and we need not nec- 
essarily worry about girls who do not seem to enjoy dancing. 
Such differences among pupils need not disturb us so long 
as they do not display symptoms, or patterns of symptoms, of 
less-than-desirable adjustment. The defect of some standard- 
ized tests of personality is that their norms seem to imply that 
deviation from a hypothetical average is necessarily an unde- 
sirable thing. 
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Another problem in dealing with personality is the diffi- 
culty of defining traits. For example, different interpretations 
are placed on “honesty,” “application,” “dependability,” and 
“adequacy of feelings of personal worth.” This difficulty con- 
trasts with the difficulty of defining intelligence. Intelligence 
has many facets, but most of them are recognized as aspects 
of intelligence. However, when personality traits differ in de- 
gree, they may become something else. Thus, self-reliance is 
an extension of dependence, but if it develops still further it 
becomes selfish egotism. As one outgrows submission, he be- 
comes ascendant, but if the characteristic develops still more, 
the individual is called domineering and with further devel- 
opment of the trait, tyrannical. Thus personality testing deals 
with varying degrees of different but intimately related traits. 
As difficult as it is to accept a measurement of intelligence as 
being valid and reliable, it is still more difficult to place 
credence in personality tests, since trait definition is even 
more elusive than definition of abilities. 

A misconception reflected in many tests is that personality 
is static. The fact is that personality is both complex and vari- 
able. People not only experience gradual change, but their ac- 
tions are variable within a few moments. This principle does 
not conflict with consistency in personality; the variability of 
behavior has been partially described in the statement that 
the manifestation of a trait is specific to a situation. For ex- 
ample, a person may be honest when it comes to shunning 
the use of his neighbor's answers on an examination, but he 
may not be honest when it comes to returning extra change 
he has received at the ticket window of a movie. One may be 
neat in the care of his room at home but exceedingly careless 
with the appearance of the spelling and arithmetic papers he 
presents to the teacher. Thus in using adjustment inventories 
it is well for teachers to bear in mind the difficulties involved 
in getting an accurate picture of “the total situation.” 
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The Fallacy of Types 

There seems to be a well-nigh universal temptation to 
classify people. Such opposites as the good and the bad, the 
white and the black, the new and the old, the traditional and 
the progressive are indicative of this tendency. One of the 
early attempts at classification was a differentiation of body 
types and an attempt to parallel these types with personality 
characteristics. E. Kretschmer postulated three major types of 
body build with corresponding personality attributes — the 
pyknic, the asthenic, and the athletic. Periodic follow-up stud- 
ies have indicated that the “types” are actually continuous 
and therefore the classification is futile. Despite the experi- 
mental evidence, there are periodic recurrences of schemes 
for typifying. Dominance-submission and introversion-ex- 
troversion are not far from such older categories as san- 
guine, choleric, and phlegmatic. The attractiveness of the 
practice of “typing” personalities is exemplified in the re- 
marks of teachers characterizing pupils as being normal or 
abnormal, academic or mechanically minded, and friendly or 
hostile. 

As we have seen, evaluating personality on the basis of ap- 
pearance is dangerous precisely because of the element of 
truth involved. Similarly, “typifying” is dangerous because of 
the degree of validity inherent in the descriptions; there are 
introverted, sanguine, academically gifted, and mechanically 
apt persons. But there are also many people between the two 
extremes, and there are those who possess some of two or 
more characteristics, and there are different manifestations of 
a given trait in various situations. It follows that measures or 
evaluations of personality based on a bimodal, trimodal or 
even multimodal distribution should be interpreted with stud- 
ied caution. Although such questionnaires or scales have cer- 
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tain values, those values do not necessarily reside in the fact 
of classification. We may be aided in our understanding of 
children by these personality measures, but it is not because 
the pupils have been “typed.” 

PERSONALITY RATING SCALES 

A rating blank, scale, or schedule is a formal set of ques- 
tions asked of one person about another or a self-rating form 
in which the individual checks certain questions about him- 
self. The questions are answered in terms of the degree to 
which the individual has the trait or does the act described in 
the question. Thus, the question may be, “What is his (or 
your) attitude when facing difficult schoolwork?” Answers 
may be arranged along a continuous line with a mark indicat- 
ing divisions between very poor, poor, average, good, or ex- 
cellent. Such evaluations are, however, considered to be too 
vague to be maximally useful, and descriptive phrases are be- 
lieved to lead to greater accuracy. Thus the item, “How ef- 
fectively does he apply himself to an activity?” is answered in 
a weighted scale, allowing a certain number of points for each 
answer. These answers range from “(1) Shifts about in ran- 
dom fashion,” through “(3) Sticks to an activity until some- 
thing more interesting is presented,” to “(5) Voluntarily pur- 
sues an activity for two or more days consecutively.” Many 
of the more recent rating scales use the more precisely de- 
scriptive approach, which has the advantage of strengthen- 
ing the objective element in the scale. 

Rating scales are available for classroom use from the 
nursery and kindergarten level through the college level and 
are sometimes used in business and industry. By means of 
them, many different aspects of personality can be investi- 
gated: there are, for example, scales measuring ascendance- 
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submission, behavior maturity, self-adjustment, delinquent 
and predelinquent behavior, attitudes, interests, and social 
adjustment. Sometimes the schedule includes several kinds of 
situations under a single heading; for example, an adolescent 
rating schedule includes fear, family emotion, family author- 
ity, feeling of inadequacy, nonfamily authority, maturity, 
escape, neurotic traits, and compensation. 1 

Teachers who wish to use personality rating scales are ad- 
vised to consult the current volume of the Mental Measure- 
ments Yearbook, where they will find descriptions of the 
kinds of behaviors which are supposed to be analyzed and the 
levels for which the schedules are specifically designed. More 
pertinent still, the instruments have been critically examined 
and carefully evaluated by scholars in the measurement field. 
The reading of these appraisals will help teachers to come to 
an evaluation regarding each scale which will enable them to 
use the results most accurately and effectively. 

A number of precautions must be observed in using per- 
sonality schedules; some of these were anticipated earlier in 
the chapter. (1) It is just as difficult to formulate a precise 
definition of the traits that are evaluated by means of the 
scales as it is to define personality. (2) There are no widely 
accepted norms for what should constitute desirable behavior. 
(3) The “specificity” of behavior makes it unlikely that the 
demonstration of a particular trait in one situation will be an 
accurate sample of that same trait as it might appear in an- 
other context. (4) The element of and danger from subjec- 
tivity is an ever-present complicating factor. The last seven 
words in the following statement, taken from an evaluation 
in the Mental Measurements Yearbook, are probably per- 
tinent to all the scales and inventories available at present: 

. . . being little better or worse than the average person- 

' Cowan Adolescent-adjustment Analyzer: An Instrument of Clin- 
ical Psychology, Salina, Kans.: Cowan Research Project, 1946. 
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ality questionnaire of its kind, this inventory makes up for 
none of the serious limitations still inherent in these instru- 
ments.” 2 

The necessity for caution in the use of an instrument should 
not cause teachers to repudiate it entirely. Rating scales can be 
used to advantage if teachers will observe the following pre- 
cautions: (1) Children should not be labeled predelinquent, 
neurotic, or poorly adjusted as a result of their scores or 
standing on a rating scale. Because of the likelihood of change 
and growth, the importance of the subject’s mood when he 
answered the questionnaire, and the possible influence of the 
mood of the person who interprets the results, the scores 
should not be placed in a permanent record folder. The ques- 
tionnaire may, however, be used by the teacher for a tem- 
porary and tentative evaluation. (2) Specific items on the 
questionnaire may serve to direct the teacher to a further in- 
vestigation of behavior in a particular area; that is, an atypi- 
cally answered item may suggest other questions that will lead 
to a better understanding of the individual. (3) The teacher 
should bear in mind that the rating scale does not constitute 
a diagnosis. It may supply some data which will make effec- 
tive diagnosis possible, but in the final analysis the individual 
items on the scale and the total score must be interpreted by 
the user of the scale. (4) The data obtained from the rating 
scale should not be regarded as conclusive or infallible. 
Rather they should be regarded as supplementary information 
which provides a test of the validity of data or conclusions ob- 
tained from the teacher’s observation of the child. 

The need for the exercise of these precautions is indicated 
in the following statement:’ 

’Albert Ellis in Oscar K. Buros (ed.), The Third Menial Measure- 
ments Yearbook, New Brunswick, N.J.: Rutgers University Press, 
1949, p. 69. 

* Laurancc F. Shaffer in Buros, op. cit., p. 56. 
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Such devices vainly seek the pot of gold at the end of the rain- 
bow: a simple, cheap, foolproof method for studying human per- 
sonality. Teachers, administrators, and school counselors who are 
tempted to consider the use of such devices would be benefited by 
a psychological insight into the fact that their own great need to do 
something about personality problems leads them to the delusion 
of accepting instruments of very low objective value. 

Rating schedules must be used with consideration for their 
inherent limitations; hence conclusions based upon them must 
be temperate and tentative. 


PERSONALITY INVENTORIES 

A personality inventory is a questionnaire on which the 
subject checks his reactions to a number of specifically de- 
scribed situations. He may be asked how he typically reacts, 
how he thinks he would feel in specific situations, or whether 
certain events have occurred in his life. Examples of each of 
these types of questions are: “Do you cross the street to avoid 
meeting someone whom you dislike?” “At an automobile 
wreck, would you get sick at the sight of blood?” “Have you 
been knocked unconscious by a blow on the head?” 

Many other situations and classes of situations are 
“plumbed’ in an inventory. No one question is considered 
crucial; it is the total response to all the questions — the pat- 
tern of the answers — that is considered significant. If the re- 
sults are not interpreted too specifically, the general trend of 
personality orientation indicated is helpful, but as in the case 
of rating schedules, the temptation to label or classify should 
be avoided, even though the enthusiastic test maker may him- 
self have classified the results. 

Some of the difficulties involved in devising instruments to 
probe personality have already been suggested. Among these 
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difficulties are ( 1 ) the vagueness of the term, (2) the fact that 
the personality orientation of the test maker creeps into the 
questions he asks, (3) the lack of a well-defined norm for 
social and personal behavior, and (4) the variability of be- 
havior in diverse situations. 

Subjectivity endangers several aspects of personality meas- 
urement: not only is the interpreter of the test subjective, but 
so inevitably is the individual taking the test. This aspect of 
personality testing is of importance to us here because, thus 
far, a negative view of personality testing has been presented 
in this chapter, and the teacher has the right to ask, “If in- 
ventories and scales are so subject to criticism, should they 
be used at all?” Actually, the subjective nature of inventories 
and scales points up the advantages of what are called projec- 
tive techniques, as we shall see later. 

As we have seen, each personality is a “private world.” As 
the individual grows and develops he learns certain tricks for 
protecting himself from “the slings and arrows of outrageous 
fortune” — for defending himself from the psychological and 
physical batterings which even a protected existence entails. 
Critics of extreme behavioristic psychology have pointed out 
that individuals do not react to stimuli in a simple, mechan- 
ical fashion; rather, each individual has a unique response. 
The late J. S. Plant described this private world as follows: 4 

Between the need of the child and the sweep of social pres- 
sures lies a membrane — a sort of psycho-osmotic envelope of 
transcending importance. . . . One should never think of this as 
a tangible, material structure. It is rather a property of that part 
of the personality which is in touch with the environment. 

It is only through the operation of the envelope that we can get 
at the problem of meaning — what anything “means” to the indi- 

* James S. Plant, The Envelope, New York: The Commonwealth 
Fund. 1950, pp. 2-3. By permission of the Harvard University Press, 
publishers. 
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vidual. . . . Certainly one of the most brilliant of the psycho- 
analytic contributions has been the theory that one sees the world 
only as he can afford to see it— that the material of the environ- 
ment is sensed by the personality only in terms of the problems 
which it is trying to work through. 

This “envelope” which protects the individual from hostilities 
and contributes to his uniqueness is in turn protected by the 
individual, who practices a measure of self-concealment. Often 
when he does wish to reveal himself he is unable to think 
clearly enough about his feelings to verbalize or describe 
these inner workings. 

Thus, in terms of objectivity, questionnaires suffer from 
two inescapable shortcomings: (1) the inability of the in- 
dividual to evaluate with accuracy his own feelings, and (2) 
the individual’s desire to keep his feelings to himself. A third 
shortcoming operates certainly in the upper grades and at the 
high school level, and perhaps even earlier; (3) the desire 
deliberately to mislead others. The motive for such behavior 
may not be negative; it may simply be a desire to please the 
teacher, for example. 

Inventories, like scales, should be used only with a proper 
regard for their limitations. They can be used to supplement 
other measures and observations and to help the teacher in- 
vestigate and gain some understanding of a particular area of 
adjustment, such as schoolwork and family or peer relations. 
Atypical responses on a questionnaire may serve as a point 
of departure for a fruitful interview. The teacher should re- 
member, however, that the scores on an inventory do not con- 
stitute a diagnosis. The following statement applies to several 
personality measures that attempt to define behavior precisely 
and categorically: “The worst features of the tests, in the 
opinion of this reviewer, arc the elaborate suggestions to 
teachers for the treatment of conditions claimed to be revealed 
by the scores, profiles, and even individual item responses. 
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When not clearly dangerous, these procedures are stereotyped, 
superficial, and lacking in clinical sense.” 5 

INFORMAL APPROACHES TO 

PERSONALITY EVALUATION 

Some approaches to personality assessment are valuable 
because they are admittedly subjective and users cannot es- 
cape the pervasiveness o£ the subjectivity. Because of the 
obvious presence of the personal element, there is much less 
danger that the tester will think he has an accurate measure 
of personality than might be the case with scales and inven- 
tories in which norms have been cited. These fruitful meth- 
ods of personality evaluation are (I) anecdotal records, (2) 
teacher-pupil conferences, and (3) staff meetings. 

The anecdotal record is an attempt to “catch” the child 
in a word picture when he is his typical or average self. The 
teacher describes without attempting evaluation or interpre- 
tation, a particular youngster as he is performing some char- 
acteristic action. The anecdote is designed simply to indicate 
to the teacher, at a later date when evaluation of the child’s 
growth is desired, what the child was like at a certain time. 
The child’s next teacher may use the anecdote, along with 
other data, to get a more complete picture of the child as he 
has been in the past. Certain precautions should be observed 
in making and using anecdotal records, however; the de- 
scribed action should be a typical one (teachers sometimes 
make the mistake of picking the strange or bizarre action to 
record), and interpretive and evaluative terms should be 
avoided in the wording of the behavior descriptions. A good 
method for making an anecdotal record is to decide at the be- 
ginning of the day to record the behavior of Albert B. at 
ten o’clock and the behavior of Patricia L. at 2:30. In this 

1 Douglas Spcnccr in Buros, op. cit., p. 58. 
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way the teacher can gradually acquire anecdotes of typical 
behavior for each pupil in his group. The outcome might be 
something like the following: 6 

September 30. Jackie has been paying special attention to Elsie 
the past few days. He put a piece of bubble gum on her desk, put 
his hands into his pockets, cast his eyes up to the ceiling, walked 
a few steps away, whistling between his teeth. Elsie took the gum, 
raised her eyes, lowered them, said nothing; but Jackie seemed 
satisfied. He has been trying to give her clean notebook paper 
every day. 

October 6 . The class chose Jackie and Mort to keep our part 
of the grounds this week. Both stayed in at recess, so the girls 
picked up the paper for them. Mort asked what to do about 
being grounds monitor. Jackie said: “If the kid is littler’n you, 
make him pick it up. If the kid is bigger’n you, report him to the 
teacher.” 

Teacher-pupil conferences in which the teacher does a 
great deal of listening are an excellent way of gaining under- 
standing of a pupil’s personality. In these conferences the 
teacher should play the role of “counselor with” rather than 
“adviser to.” A questionnaire may tell how a person typically 
acts or how he has behaved in the past, but a conference in 
which the teacher listens at least part of the time will produce 
much more information about why the pupil behaves as he 
docs. The difficulty with the technique is that teachers tend 
to give too much advice, although analysis of their successful 
experiences in working with pupils shows that solutions were 
discovered only after they had gained, through listening, an 
understanding of how the pupil felt about his difficulties. At 
every age, pupils talk freely with teachers who are patient 
listeners. Often teachers are in such a hurry to get results that 

‘Helen Bicker in Fostering Mental Health in Our Schools. 1950 
Yearbook. Association for Supervision and Curriculum Development, 
Washington: National Education Association, 1950, p. 189. 
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they fail to encourage the development that will help the pupil 
to understand his own personality. 

One relatively unexplored but highly fruitful way of se- 
curing a better understanding of individual children is the con- 
ference composed of a small number of teachers. In these 
conferences, a teacher mentions the name of a pupil he would 
like to help on a professional basis. Teachers who have had 
the pupil previously will be able to suggest helpful approaches 
to and reveal their insights into his problems. Frequently 
teachers who do not know the pupil concerned will make 
important contributions to such conferences, since a teach- 
er’s experience and knowledge can sometimes be most fruit- 
fully brought to bear when he does not know the pupil. One 
of the authors has tried this technique frequently by reading 
data on a particular case to a group of teachers; the suggested 
approaches to the problem involved have often been prac- 
tically the same as those suggested by psychologists on the 
basis of the test data and interviews. 

These informal approaches (anecdotal records, teacher- 
pupil conferences, and teacher conferences) are especially 
valuable because they do not promise a miraculous conclu- 
sion. They are admittedly subjective; hence there is more like- 
lihood that appropriate allowance will be made for subjec- 
tivity. The advantage of these techniques over formal instru- 
ments is that teachers can base their subjective judgments and 
evaluations on objective data. 

PROJECTIVE TECHNIQUES 

As we have seen, personality is, at least in part, the private 
world of the individual. To the extent that this is true, one 
must, in order to understand the vast realm of emotional ex- 
perience, study the individual when he is o(f guard. Various 
projective techniques provide fruitful approaches to this 
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aspect of personality. Many projective techniques have the 
double advantage of providing some degree of therapy during 
the process of study or analysis, for as the child carries on the 
activities that will be observed and interpreted, he is also 
getting rid of some of the tension that is complicating life 
for him. 

A projective technique involves a situation which is mean- 
ingless, ambiguous, amorphous, or neutral. What the person 
being tested does or sees in these meaningless circumstances is 
not dictated by external questions, directions, or demands; his 
actions are an expression of himself. The meanings which he 
believes are present in the pictures or stories are meanings 
which he puts there himself. 

One of the earliest and most widely used projective tech- 
niques, the Rorschach test, presents a series of ink blots such 
as could be made by allowing a drop of ink to fall upon a 
paper from a height and folding the paper over in such a way 
as to produce symmetrical halves. Some of the blots in the se- 
ries are black, some have many colors; being formless, they rep- 
resent nothing. The subject is asked what he sees in them; 
what he reports is, obviously, a projection of himself. The 
scoring and interpretation of the Rorschach blots are involved, 
extensive, and time-consuming processes which require highly 
specialized training. Research is still going on, but there is no 
indication at present that this process will become a routine 
classroom technique. The untrained individual must be warned 
against the dangers of uninformed and irresponsible interpre- 
tation of responses to projective techniques. 

Other projective techniques include a cloud test, in which 
each member of the group tells what he sees in a pictured 
cloud formation, much as children describe their castles in 
the air; a sentence-completion test, in which the subject Is 
presented with the first part of a number of sentences and is 



APPRAISING PERSONALITY 


137 


asked to complete them in any way he sees fit; and a story- 
completion test, in which part of a story is read to the sub- 
ject, who is then asked to tell what happened in the rest of 
the tale. 

Play techniques consist of giving the child a few toys to 
play with and observing what he does with them, or what he 
has the toys and dolls do. An important element in play tech- 
niques is a high degree of permissiveness (i.e., the child is 
made to feel that there are no important compulsions or re- 
straints being placed upon him) which cannot practically be 
made a part of the classroom situation. However, the princi- 
ples of play techniques are useful to the teacher in that the 
child gives a picture of his inner self when he is engaged in 
spontaneous play, either alone or with others. By cautiously 
interpreting this behavior and evaluating it against other data, 
the teacher can see more clearly specific aspects of the child’s 
personality. The teacher might well be advised to see that in- 
terference with what to the adult are objectionable aspects of 
play is held to a minimum, thus allowing the child some 
chance to “spill over” with some of his hostile or frustrated 
feelings. 

Some practices which, when used informally, we should 
perhaps not call projective techniques can be put to imme- 
diate use by teachers in understanding personality orienta- 
tions. One of these is free or creative writing. The pupil is 
encouraged to write whatever he likes — stories, poems, biog- 
raphies, or articles — and criticism of content and composition 
is kept to a minimum. When sound rapport exists between 
teacher and pupil, trends will frequently appear in the writing 
that will serve as diagnostic aids to the teacher. No great re- 
liance should be placed on single bits of writing; it is the 
recurrent theme that is important. Since some youngsters have 
difficulty in thinking of what to write, suggestions may be 
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made: My Favorite Pastime, My Pet Peeve, My Ideal Boy 
Friend, My Kind of Father, etc. As with play techniques, 
data from free writing should be interpreted cautiously and 
should be regarded as supplementary information. 

Fingerpainting is a favorite technique of many classroom 
teachers for getting revealing glimpses of pupils from the be- 
ginning of their school experience. Some pupils are reluctant 
to participate, and even this reluctance, in conjunction with 
other information, may be revealing or suggestive of person- 
ality trends. The kinds of color, the kinds of strokes, and the 
degree of freedom of movement and care employed all may 
give the teacher clues to the meaning of behavior. The au- 
thors do not recommend direct interpretation of these features 
of the paintings, however; they are clues only. Quite apart 
from analysis, many teachers have found that children talk 
more freely when they have a picture to which to point; that 
is, a child may be unable to discuss a feeling such as a resent- 
ment, but he may be able to paint it and describe what he has 
painted. 

Working with clay is another projective technique. As with 
paint, the characteristic way of dealing with the medium, the 
vigor of movement, and the degree of satisfaction or discon- 
tent with the product are all elements that might be involved 
in the interpretation. 

The advantage of projective techniques, in the main, is that 
they have not become routine, stereotyped, and standardized. 
There is, of course, the danger that the user will project him- 
self into the interpretations and conclusions, but the teacher 
who attempts to interpret what he sees in a child’s writing, 
play habits, and art processes and products fully realizes that 
the interpretation is wide open to error; consequently he is 
careful in its use. Such data, cautiously used, may frequently 
be more helpful in the evaluation of personality than the re- 
sults of standardized instruments that give apparently accurate 
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statistical interpretations of data that are necessarily subjec- 
tive and approximate.' 


SUMMARY 

All instruments for measurement or evaluation are in reality 
used as approaches to the understanding of some aspect of 
personality. The word personality is an inclusive term embrac- 
ing actions, inner feelings, and what others think of one. It is 
obvious that a thing so complex and ever-changing cannot 
readily be measured in a mathematical sense. The difficulty 
of evaluation, however, should not lead the teacher to a re- 
pudiation of the available instruments. Rather, an understand- 
ing of the complexity of the problem should underscore the 
need for proper caution and reservation of final judgment. 

Rating scales are designed to systematize judgments or ob- 
servations regarding oneself or others. The shortcomings of 
the scales are that they are subjective and must of necessity 
be limited by the particular questions asked — the things the 
scale maker thinks are most significant. If the teacher uses 
them with these limitations in mind, they provide useful cor- 
rective or corroborative information. If, however, conclusive 
judgments are based on the attractive norms, the results will 
be unfortunate for many pupils. 

Personality inventories are subject to limitations similar to 
those of rating scales. They have their place in evaluation 
schemes, where they may be used as the starting point for an 
interview or as supplementary data. If the teacher finds him- 
self attracted by the statistical norms which sometimes ac- 
company such tests, he would do well to heed the words of 

1 Another approach to the understanding of personality — function- 
ing in a group, or sociometry — wilt be examined in Chap. 8. This 
technique is available to classroom teachers without any outlay for 
equipment save perhaps a book which explains in detail some of the 
problems involved in the approach. 
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George G. Thompson: “[It] is surprising (in the face of this 
preponderance of negative research findings) that these per- 
sonality questionnaires should continue to be so widely used 
in school and youth-guidance organizations!” 3 

Some recent developments in personality evaluation give 
indications of overcoming some of the defects of older meas- 
ures. Inclusively, these tools are called projective techniques. 
They include art, free writing, and spontaneous play used for 
the purpose of gaining an understanding of children. One ex- 
planation of the value of these instruments is that they are ad- 
mittedly subjective and approximate. They unearth clues or 
furnish supplementary data. Specifically, although one would 
not be justified in concluding from a child’s drawings that he 
has a mother fixation, one can discover signs of emotional 
tensions that should be more carefully studied in home visits, 
interviews, and further psychological investigation. 

The evaluation of personality is an inescapable responsi- 
bility of the school, since evaluation must precede construc- 
tive help. The instruments available today for evaluating per- 
sonality are tools for increasing the accuracy of the teacher’s 
perception, just as the stethoscope increases the accuracy of 
the doctor’s diagnosis. The fact that personality instruments 
are imperfect indicates only that they should be used with 
appropriate regard for their shortcomings, for they provide 
a means of arriving at a tentative evaluation of certain aspects 
of the child’s personality. 

STUDY AND DISCUSSION EXERCISES 

1. What is the significance for teachers of the statement, “When 
one describes the personality of another, he reveals himself’? 

2. Point out some instances in typical everyday conversations 

'George G. Thompson, Child Psychology, Boston: Houghton Mif- 
flin Company, 1952, p. 614. 
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which indicate the tendency to classify persons as personality 
types. 

3. Study the reviews of three or four well-known personality 
inventories, using the Mental Measurements Yearbook. Do the re- 
views reflect or contradict the views presented in this chapter? 

4. Formulate a list of suggestions which would help teachers to 
use their own subjective evaluations of pupils more constructively. 

5. Consult the Education Index and find and report on some 
articles published in the last six months having to do with the use 
of projective techniques by classroom teachers. 

6. Which would you consider to be more important for a boy 
having difficulty in social adjustment at school — a factual study 
of home and community or an interview which reveals how he 
feels about his home and community? 

7. Evaluate this statement: Personality is formed in the first 
six years of life. 


SUGGESTED ADDITIONAL READINGS 

Bernard, Harold W.: Mental Hygiene for Classroom Teachers, 
New York: McGraw-Hill Book Company, Inc., 3952, pp. 297- 
362. 

Three chapters deal with the role of writing in the release of 
tensions and interpretation of personality, with art as an ap- 
proach to understanding personality, and with play and drama 
as classroom techniques in pupil understanding. 

Bieker, Helen: “Using Anecdotal Records to Know the Child,” 
in Fostering Mental Health in Our Schools, 1950 Yearbook, As- 
sociation for Supervision and Curriculum Development, Washing- 
ton: National Education Association, 1950, pp. 184— 202. 

This is a condensed account of the aims, techniques, and ad- 
vantages of the anecdotal record. It provides background ma- 
terial which prepares one to experiment for himself. 

Cattell, Raymond B.: Personality, New York: McGraw-Hill Book 
Company, Inc., 1950, chap. 4. 

A scholarly description and evaluation of various techniques 
for testing personality. The discussion points up the difficulties 
inherent in the problem of personality assessment. 
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Kaplan, Louis, and Denis Baron: Mental Hygiene and Life, New 
York: Harper & Brothers, 1952, pp. 52-80. 

This chapter discusses the origin and meaning of personality. 
The uniqueness of personality, rather than the division into 
types, is described. 

Klopfer, Bruno, Mary D. Ainsworth, Walter G. Klopfer, and 
Robert R. Holt: Developments in the Rorschach Technique , 
Yonkers, N.Y.: World Book Company, 1954. 

This book contains a detailed description of the technique and 
theory of Rorschach tests. It will be of interest to the student 
who wishes to specialize in clinical testing. 

Olson, Willard C.: “Personality,” in Walter S. Monroe (ed.). En- 
cyclopedia of Educational Research, rev. ed., 1950, pp. 806-817. 
The greater part of this article is devoted to a critical examina- 
tion of the uses and shortcomings of methods for appraising 
personality. An extensive bibliography for further study is in- 
cluded. 

Thompson, George G.: Child Psychology, Boston: Houghton Mif- 
flin Company, 1952, chap. 14. 

Approaches to the evaluation of personality are discussed in 
terms of the theoretical constituents of personality and the kind 
of development that seems to be culturally expedient. 



CHAPTER NINE 


Evaluating Classroom Social 
Relationships 


In the classroom, the child is brought into contact with other 
children in a social situation which influences his academic 
achievements and his personal and social adjustment. One of 
the important tasks which face the child of school age is the 
development of satisfying relationships with his peers. Ade- 
quate relationships minister to the child’s need for social 
acceptance and the approval of his age mates. The child’s 
attitude toward life and learning in the school situation may 
be either favorably or unfavorably influenced by the nature 
of the social climate of the classroom. School learning cannot 
be isolated from the social setting in which it occurs. 

factors related to social acceptance 

Investigations of social acceptability among children of 
school age have pointed to a number of considerations which 
are of importance to the teacher. Children tend to select their 
friends from their neighborhood and classroom groups 1 and 

* M. V. Seagoe, “Factors Influencing the Selection of Associates,” 
Journal of Educational Research, 27:32—40, 1933. 
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on the basis of similarity in chronological and mental age . 2 
Physical condition, proficiency in playground activities, and 
neuromuscular skill play a significant role in social accepta- 
bility during the school years . 3 

Social acceptance is also related to scholastic achievement. 
For example, “best-liked” children are typically superior to 
unpopular children in scholastic ratings and in reading 
achievement . 4 It has been demonstrated, too, that children 
tend to choose as friends those classmates who are somewhat 
similar to themselves in mental age and scholastic achieve- 
ment . 5 Children who have been retarded in school are fre- 
quently among the unchosen individuals in the group and are 
likely to display problems in social and emotional adjustment . 0 

Social acceptance in the classroom is related to the personal 
and social characteristics of the individual. Popular children 
are typically more self-confident and emotionally stable than 
unpopular children 7 and evidence a greater degree of outgoing 
energy. 

Thus the social status of a child among his peers is related 
to developmental characteristics and environmental factors. 
The fact that a child’s acceptance status tends to remain rela- 

R. Pintner, G. Forlano, and H. Freeman, “Personality and Atti- 
tudinal Similarity among Classroom Friends,” Journal of Applied 
Psychology, 21:48-65, 1937. 

* B- Grossman and J. Wrighter, ‘The Relation between Selection- 
Rejection and Intelligence, Social Status, and Personality among Sixth- 
grade Children” Sociometry. 11:346-355, 1948. 

* M - C - Hard >’> “Social Recognition at the Elementary School Age ” 
Journal of Social Psychology, 8:365-384, 1937. 

D. S. Belden, A Study of the Nature of Social Structure” (un- 
published), Division of Research and Guidance, Los Angeles County 
Schools, 1942. 

*A. A. Sandin, “Social and Emotional Adjustments of Regularly 
Promoted and Nonpromoted Pupils” Child Development Mono- 
graphs, 1944, no. 32. 

’ D. Baron, “Mental-health Characteristics and Classroom Social 
Status,” Education. 69:306-310, 1949. 
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tively constant from year to year® indicates that problems of 
social acceptance represent an area in which guidance is 
needed and which should be an important concern of teachers 
and parents. The relationships between social acceptability 
and scholastic and personality factors indicate that questions 
of promotion or nonpromotion, acceleration, reorganization 
of classroom groups, changes of school, neighborhood, or 
classroom may represent crucial decisions from the point of 
view of adjustment and learning. 

STUDYING SOCIAL RELATIONSHIPS 

In the course of his everyday activities the teacher has 
many opportunities to observe children working and playing 
together. When such observations become systematized and 
purposeful, the data which they provide are likely to become 
increasingly valuable. Scientists have developed a number of 
techniques designed to systematize the study of social rela- 
tionships. One of these techniques, the sociometric method, 8 
is designed to facilitate the study of individuals in groups and 
is readily applicable to the classroom situation. The method 
involves the selection of associates for group activities. In the 
classroom, for example, pupils may be asked to choose seat- 
ing companions, group leaders, associates in specific activ- 
ities, guests for parties, and so on. Various techniques of re- 
cording, charting, and evaluating such choice patterns have 
been developed. 

In group situations, certain patterns of attraction, neglect, 
and rejection develop among individuals. In classroom groups, 

* M. E. Bonney, "The Relative Stability of Social, Intellectual, and 
Academic Status in Grades II to IV and the Interrelationships between 
these Various Forms of Growth,” Journal of Educational Psychology, 
34:88-102, 1943. 

* J. L. Moreno, Who Shall Survive? Washington: Nervous and Men- 
tal Disease Publishing Company, 1934, pp. 12-14. 
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for instance, some children become the focal points of attrac- 
tion and their company is eagerly sought by many members 
of the group. Other children may be overlooked when asso- 
ciates are selected, and still others may be actively rejected as 
companions. The social “climate” of a classroom is profoundly 
influenced by the pattern of interrelationships which prevails 
among members of the group. One classroom group may be 
drawn together by numerous attractions which extend through- 
out its membership; this situation facilitates united, coopera- 
tive effort. Another group may be comprised of mutually ex- 
clusive subgroups; in such classes the possibilities for coopera- 
tive group activities are minimized . 10 The sociometric method 
enables the teacher to obtain information concerning the pat- 
tern of relationships which forms the social “climate” in which 
pupils live in his classroom. 

The Sociometric Question 

In the sociometric method, pupils are asked to choose the 
associates they would prefer for a specific situation. The ques- 
tion might be, “Which three of your classmates would you 
prefer to have as your best friends?” This is a general question 
which implies no forthcoming action. A more specific question 
ideally would imply subsequent action: “We have decided to 
have a puppet show. Which of your classmates would you 
prefer to work with in preparing the show?” The teacher 
should design sociometric questions in such a way as to elicit 
valid or real preferences. Choices are likely to be most valid 
when the situation is real and meaningful and when the pupils 
arc assured that the choices will be acted upon. The following 
question, for example, meets these criteria: “You are seated 
now according to a plan which seemed convenient. You have 
now had a chance to become acquainted with each other and 
perhaps would like to be seated near someone of your own 

* Hilda Taba ct al.. Diagnosing Human-relations Needs , Washing- 
ton: American Council on Education, 1951, p. 71. 



EVALUATING CLASSROOM SOCIAL RELATIONSHIPS 147 

choice. Which (two, three, four) of your classmates would 
you like to have seated near you? You will be seated near at 
least one of the persons you choose.” In this statement, the 
situation, the purpose, and the number of preferences allowed 
are all specified. Finally, assurance is given that the results 
will be utilized. 

The following principles will help the teacher in framing 
sociometric questions: 

1. Give pupils a good reason for listing their preferences. 

2. Present your plans for utilizing the choices. 

3. Plan and word the directions carefully so that pupils 
will understand clearly what is wanted. 

4. State the question in such a way that pupils fully under- 
stand it. 

The sociometric question should be formulated in terms of 
the actual situation and the purposes of the teacher. However, 
the list of questions which follows may suggest some areas 
which provide meaningful situations: 

1. Which of your classmates do you prefer to have seated 
near you? 

2. Some of you are having difficulty with your work. Which 
boys or girls do you choose to help you? 

3. We are going to plan a field trip. Which boys or girls 
do you prefer to work with on the planning committees? 

4. We have planned a project in social studies. Which boys 
or girls would you like to have as members of your group? 

5. We are going to select groups for games on the play- 
ground. Which of your classmates do you prefer as members 
of your group? 

6. The other day we decided to hold a class picnic. Which 
boys or girls do you choose as members of the planning com- 
mittee? 

7. We plan to have a class party. We will be seated in 
groups around tables for lunch. Which of your classmates do 
you wish to have seated at your table? 
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8. A group of pupils is to plan a program for a parent- 
teacher meeting. Which of your classmates should represent 
our class on the planning committee? 

The following suggestions will help the teacher in the prep- 
aration and administration of sociometric questions: 

1. Utilize realistic and meaningful choice situations which 
bear a definite relationship to the activities of the group. 

2. Word the question in such a way that the pupils under- 
stand its purposes and significance. 

3. Have a few pupils prepare a list of the first and last 
names of the members of the group. The lettering should be 
large enough so that all the pupils can read the names. 

4. Allow sufficient time for pupils to record their choices. 

5. Have pupils list their choices on a small sheet of paper or 
a 3- by 5-inch card. Each pupil should sign his paper or card 
so that he may be identified. It helps to have a sample of the 
choice blank presented on the chalkboard. A suggested form 
for recording sociometric choices is presented in Figure 9. 

6. Indicate precisely the number of choices which each 
pupil is to make. The number of choices requested will vary 
with the sociometric question, the purposes of the teacher, and 
the practical problem of the amount of time available for tab- 
ulation and evaluation of the results. Certain authorities sug- 
gest three choices by each pupil as the most practical num- 
ber. 11 Other investigators indicate that larger numbers of 
choices result in increased validity. 12 The age of the pupils is 
a further consideration, since children in the primary grades 
typically choose fewer associates than children in the middle 
and upper grades. 

7. Explain the range of choice. Ordinarily choices are lim- 
ited to members of the classroom group exclusive of the 

“ Ibid., p. 76. 

n E. Eng and R. L. French, “The Determination of Sociometric 
Status,” Sociometry, 11:368-371, 1948. 
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your name. Beverly 

Oir.l) J (I..0 

grade. 3 hatf MaulS, 1957. 

SGHDOI 9? S. / 4- 

TEACHER. MlSS Smith. 

QUESTION: With whom would you like to work on our project 
in social studies? 


CHOICES. 

First Name. Last InitjaU 


2 . 

3. 


5. 


fro. 9. Suggested form for recording sociometric choices, showing 
two sides of the choice blank. 

teacher. In certain situations the range of choice may be 
wider or more limited. The teacher should specify whether or 
n °t pupils who are absent are eligible for choice. 

Scoring and Tabulating the Results. In many instances it is 
n °t necessary to weigh or give score values to the choices. In 
su ch cases the pupil’s sociometric “score” is the total number 
°t choices he receives from members of the group. In ot ter 
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cases the teacher may wish to consider the order of preference 
and assign arbitrary score values to choices in terms of the 
rank of the choice, as first choice, second choice, third choice. 
For example, in a situation where five choices are requested, 
a first choice might be assigned five points, a second choice 
four points, a third choice three points, and so on. Such scor- 
ing is arbitrary and does not necessarily reflect the actual 
value or intensity of the preference. Where a system of 
weighted scores is to be used, however, the directions to pupils 
should include the request that associates be listed in order 
of preference. 

The tabulation sheet should contain a complete record of 
the results of the sociometric test. It should include all the 
data needed to identify the group, the date of the test, the 
nature of the question, the number of choices requested, the 
method of scoring, and other information (such as the num- 
ber of pupils absent on the day of the test) which may be 
important in interpretation of the data in the record. The tab- 
ulation plan represented in Figure 10, which presents the 
complete record for a sample group, is one of a number of 
methods which meet these requirements: 

1 . Essential data are indicated at the top of the sheet. 

2. The tabluation sheet is blocked off in cells, one row and 
one column for each pupil in the group. First names and 
initial of last names are listed across the top and down the side 
of the tabulation sheet. 

3. Girls and boys are listed separately in alphabetical order 
according to initial of last name. A vacant row and column 
separate the two lists. This type of listing helps in the analysis 
and interpretation of the results. 

4. The columns represent preferences indicated on the 
question blanks. Choices are entered in the cell where the 
column under the name of the pupil chosen is intersected by 
the row opposite the name of the chooser. For example, Bev- 




Fig. 10, Sociometric tabulation sheet. 
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erly A. (first column) is chosen as an associate by June B. 
The figure 5 under Beverly and opposite June indicates that 
this is a first choice. 

5. To facilitate tabulation of choices, the choice blanks 
(Figure 9) can be arranged in the order in which choosers’ 
names appear on the tabulation sheet. Each choice can then 
be listed along the rows under the name of the pupil chosen. 
The score value of each choice is indicated if score values are 
used. 

6. The sociometric “score” of each pupil is derived by sum- 
ming the columns, as indicated opposite A in Figure 10. For 
example, Beverly’s sociometric score is 10, the sum of the 
score values in the column under her name. 

7. If the teacher wishes, he may also indicate the number 
of choices of each rank and the total number of choices re- 
ceived by each pupil. These figures are shown opposite B and 
C in Figure 10. 

The tabulation sheet presents a summary statement of the 
results of the test, indicating the choice status of the pupils. 
In our example, Jimmy Y„ with 41 points, leads the group in 
sociometric score. The leading girl is June B., with 29 points. 
No pupil is unchosen, but the lowest scores are those of 
Dolores K. (3 points) and Jackie S. (4 points) among the 
girls and Bob C. (4 points) among the boys. 

The tabulation sheet may be easily preserved for future 
reference and comparisons, and it is a basic work sheet for the 
teacher who plans to further analyze the results of testing. 
From the tabulation sheet, sociograms may be developed as 
a means of further clarifying relationships within the group. 

The Sociogram 

The sociogram is designed to portray graphically the choice 
relationships which are recorded on the tabulation sheet. The 
method presented here makes use of the target diagram as a 
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Question : With whom would you like to work on 
our project in Social Studies ? 

No. of choices represented : 2 


Girls-. (^) 


Dote of test: 4/15/57 
School : P.S. 14 {City or Town) 
Grode: Third 
Teocher: Miss Smith 

First choice — >- 



I'uriL: Dolores K. Grade: 3 School: P. S. N (City or town ) 
Sr.\: F Aon: 8-S Datc: 4/15/57 

Question: “With whom would you like to work on our project in 
social studies?" . 

Choices: 5 Score values: 5. 4. 3. 2. 1 (in terms of mnk of choice) 
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means of representing sociometric data in graphic form. This 
type of sociogram is based on a series of concentric circles 
bisected vertically, as shown in Figure 11. The small circles 
to the left of the center line represent girls and the rectangles 
to the right of the line represent boys. Pupils who rank in the 
highest 25 per cent of the group are located, by initial, in the 
inner circle. In the outer circle are those pupils who comprise 
the lowest 25 per cent in sociometric score. The location of 
pupils within the various circles roughly approximates their 
sociometric rank. 

The following suggestions will help the teacher prepare a 
sociogram of this type: 

1. Use a large sheet of paper for a trial form. 

2. Draw the concentric circles and bisecting line. 

3. Fill in the necessary identifying data (i.e., grade, school, 
date, teacher, question, number of choices to be represented, 
score values if any, meaning of the symbols employed). 

4. Within the innermost circle indicate the boys and girls 
who, according to score, rank in the upper 25 per cent of the 
group. Disperse these symbols within the circle. 

5. Indicate the relative positions of pupils within the sec- 
ond circle. Distribute the symbols throughout the available 
space. 

6. Locate the pupils with the lowest sociometric scores 
within the outer circle. These symbols should ideally be lo- 
cated so that lines can be drawn directly to the symbols in the 
central circle. 

7. Draw lines representing the direction of the first choices 
of pupils in the group, using arrow tips to indicate direction of 
choice, as in Figure 11. The number of lines can be reduced 
by using a single line with double arrow tip and bar to indi- 
cate mutual choices. 

8. Indicate second choices similarly by using a dotted or 
colored line. Other levels o[ choice may be indicated if the 
teacher wishes. 
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9. Study the trial sociogram for ways to relocate symbols 
in order to improve the clarity of the diagram and estimate the 
number of choices which can be satisfactorily depicted for the 
group. 

Sociograms are of great assistance to the teacher in study- 
ing relationships among pupils. The following list suggests 
some of the advantages they offer: 

1. Relative sociometric ranks are revealed graphically. 

2. Directions of choice and extent of mutuality of choice 
are indicated. 

3. Heavy concentrations of choice are revealed. 

4. Choices which run across sex lines are clearly indicated. 

5. The teacher can readily identify individuals and study 
their choice relationships with others in the group. 

6. Possibilities for grouping pupils in a psychologically 
meaningful way are portrayed. 

7. The popular individuals, the unchosen, mutual pairs, 
and subgroups are graphically depicted. 

For example, the following noteworthy features are re- 
vealed concerning the group studied in Figure 1 1 : 

1 . There are more girls than boys in the circle representing 
high choice status. 

2. Choices of boys are heavily concentrated on Jimmy Y. 
Choices of girls show greater dispersion. 

3. Boys in this group frequently select girls as working 
companions, but girls seldom select boys. 

4. By comparison with the total number of choices de- 
picted, the number of mutual choices is relatively small. 

5. There arc indications of some rather closely knit groups, 
particularly in terms of first choices. 

6. No pupil, with the exception of D. K., a girl, fails to re- 
ceive either a first or second choice. This would seem to indi- 
cate relatively good relationships throughout the group. A fur- 
ther indication of a good dispersion of attractions is seen in 
the chaining of choices, as with P. V., E. R., C. F., and J. Y. 
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A further method of graphic representation is presented in 
Figure 12, which depicts the choice relationships of an indi- 
vidual pupil, Dolores K. The direction of choice is again rep- 
resented by arrow tips, and the score values of choices are 
indicated near the inner circle. For example, Dolores gives 
her first choice (5 points) to D. B., one of the boys of the 



of Dolores' '^ rarnrnat ‘ c re P res entation of the sociometric relationships 

group. She gives her second choice (4 points) to I. N„ and 
! S ‘ h ® . th,rd c J’ 0,ce O Points) of I. N. The mutuality of choice 
is indicated by double arrow tips and bar. This type of dia- 
gram helps to clarify the choice relationships of individuals 
whom the teacher may wish to study further and is useful in 
working out committee and other classroom groupings. 

ANALYZING AND INTERPRETING RESULTS 

Some of the interrelationships within a group are readily 
discerned in the results of sociomctric testing. Still other sets 
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of relationships are not so clearly defined but may be clarified 
by various techniques of representation and analysis. The so- 
ciogram and the diagrammatic representation of the choice 
relationships of individuals offer possibilities for clarification 
and analysis of the data. 

The teacher will undoubtedly study the results by means of 
questions which apply to his unique situation. The following 
questions may serve as leads in developing this type of 
analysis. 1 ’ 

1. Do the choices center upon a few pupils, or are they 
relatively well dispersed. In our example, almost 20 per cent 
of the choices received by boys of the group are centered 
around Jimmy Y. The teacher may wish to consider possible 
reasons for the popularity of individuals with high choice 
status. Such pupils may play important roles in determining 
classroom morale and leading the activities of the group. 

2. Are there pupils who receive no choices? Typically 
there are “isolates” in every class. The proportion of unchosen 
children is ordinarily highest in the kindergarten and the first 
two grades. In the third-grade group we have been discussing, 
there is no child who is unchosen and only one pupil, Do- 
lores K., who fails to receive either a first or second choice. 
In a two-choice situation, Dolores would be considered an 
isolate. Observation of unchosen children may reveal be- 
havioral or other factors which interfere with their acceptance 
by their peers and may suggest ways in which teachers can 
help these pupils establish themselves as accepted members of 
the group. 

3. Do choices cross sex lines, or is there a rather definite 
cleavage between girls and boys? During the first three years 
of school there is generally less cleavage between the sexes 
than in the succeeding three years. In the case of our third- 
grade group, cleavage along sex lines is especially marked in 

” Adapted in part from Taba et a)., op. cit.. pp. S3-86. 
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the case of the girls. This pattern is fairly common during the 
early school years and is an aspect of boy-girl relationships 
which is an important consideration in the grouping of pupils 
during this period. 

4. Is there a satisfactory degree of mutuality in the choice 
patterns? Mutuality of choice is ordinarily likely to indicate 
satisfying relationships. However, in some instances pupils 
may pair off to form small, tightly closed groups. In our ex- 
ample, a considerable degree of mutuality is evidenced when 
all choices are considered. Furthermore, considerable “chain- 
ing” is evidenced, which seems to be indicative of a series of 
attractions which run through and knit together the groups of 
boys and girls. For instance, although a triangular chain rela- 
tionship of first choices links S. L., L. M., and J. D., the pat- 
tern of second choices indicates that members of this group 
have good relationships with others of their classmates. 

The above suggestions offer some basic possibilities for 
study of the results of sociometric testing. The teacher will un- 
doubtedly note other relationships which are of special inter- 
est to him; for example, we may look for the choice patterns 
that he expected to find. It is also profitable to look for the 
unexpected; in fact, a most common reaction of teachers 
using the sociometric design for the first time is surprise at 
seeing relationships which they had not previously realized. 
Frequently, for example, the teacher may find that a pupil is 
more or less popular than he had expected; or he may find 
ines of attraction taking unexpected directions or intensities. 
Such events merit special study and may increase the teach- 
ers understanding of his pupils. 

Utilizing Results 

The feelings and attitudes, attractions and repulsions which 
pervade the group inevitably influence the learning activities 
of the classroom. Sociometric data may enable the'teacher to 
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develop more satisfying, meaningful, and effective learning 
situations, since they reveal who the preferred leaders and as- 
sociates of the pupil group are. When accompanied by first 
hand observation of the pupil leaders, a knowledge of the 
leadership roles of pupils is helpful in developing morale, in 
the management of the classroom, and in the development of 
psychologically meaningful pupil groups. 

Recognition and observation of pupils who receive few 
choices or none at all may alert the teacher to group or indi- 
vidual problems. The teacher who has identified pupils of this 
type and who is aware of their preferences in the group is fre- 
quently able to help such pupils attract the attention and re- 
spect of others. This can sometimes be accomplished through 
judicious grouping or through capitalizing upon a special skill 
or hobby to bring a pupil into the group. 

In classes in which pupils are organized into almost mu- 
tually exclusive groups, sociometric-test data may indicate 
linkages by means of which the teacher can encourage more 
expansive patterns of social interaction. The pupils’ choice pat- 
terns also suggest possibilities for improved group activities 
and the formation of more harmonious working groups. 

The first step in putting sociometric data to work is to act 
upon the results in terms of the purpose for which the test was 
given. If the test question referred to seating arrangement, the 
class should be reseated in a pattern closely approximating 
the choice patterns revealed by the test. Ordinarily, some 
compromises will be necessary. If the question referred to the 
formation of working groups, such groups should be organ- 
ized on the basis of the findings. Again, ingenuity will be re- 
quired in working out acceptable compromises. The following 
suggestions may help the teacher utilize test results: 

1. If possible, give the unchosen pupil his first choice. 

2. When choices are mutual, give the pupil his highest re- 
ciprocated choice. 
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3. If the pupil has chosen only individuals who have not 
chosen him in return, give him his first choice as an associate 
if there is a possibility that he will be accepted by this indi- 
vidual. 

4. Do not place any pupil with a pupil who may actively 
reject him. 

5. In forming groups on the basis of the results of socio- 
metric tests, provide each pupil with an associate of his 
choice. If possible, organize groups in such a way that their 
members are linked together by the choice patterns. 

6. Provide for leadership which will be recognized and 
accepted by group members. 

The sociometric test is a tool which provides the teacher 
with information regarding the interrelationships of individ- 
uals in the group. Like other test results, this information is of 
greatest value when it is used in conjunction with data ob- 
tained from other sources. It gives impetus to the teacher’s ob- 
servations of the social interaction of pupils, and it may form 
the basis for the development of meaningful and satisfying so- 
cial and learning experiences in the classroom. 

SUMMARY 

The classroom is a social situation which has a significant 
impact upon the learning activities and social development of 
pupils. The sociometric test provides a means of studying the 
social interactions of persons in groups. The individual taking 
the test is asked to select one or a number of companions for 
a situation in which social relationships are important. 
Choices are tabulated; graphic representations may be devel- 
oped for the group or for the individual, and the results may 
be utilized as a basis for grouping pupils for the specified situ- 
ation or activity. 

The utilization of sociometric devices in the classroom pro- 
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vides the teacher with information regarding (1) the accept- 
ance status of pupils in the group, (2) the lines of attraction 
among pupils, and (3) cleavages within the group. This in- 
formation can be of value in grouping pupils for work or 
play, in studying the problems of individual pupils, in devel- 
oping pupil leadership in classroom activities, and in the im- 
provement of relationships among members of the group. So- 
ciometric data help the teacher create the appropriate social 
setting for learning. 


STUDY AND DISCUSSION EXERCISES 

1. Present some reasons why teacher and pupil choices of 
leaders for classroom activities sometimes differ. What values do 
you see in pupil selection of classroom leaders and associates? 

2. List some classroom situations which might form the basis 
for sociometric questions. 

3. What values might the teacher derive from the use of socio- 
metric questions which refer to extraclassroom situations? 

4. What particular advantages might a teacher who is new to a 
classroom group derive from sociometric data? What difficulties 
might such a teacher find in the interpretation of the data? 

5. What advantages are there in keeping records of the results 
of successive sociometric tests? 

6. If a classroom is available to you, arrange to administer a 
sociometric test. Develop a sociogram on the basis of the results. 

7. What methods might the teacher use to find an explanation 
of the results of sociometric testing? 
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CHAPTER TEN 


Studying Interests and Attitudes 


Interests and attitudes offer clues to the understanding of 
the behavior of the individual, since both are closely related 
to emotional life. They determine essential aspects of motiva- 
tion and can facilitate or interfere with the efficiency of learn- 
ing in the classroom, for a learning program geared to the in- 
terests of the pupils becomes vital and meaningful to them. 
Favorable attitudes toward the school, the learning task, the 
teacher, and the group facilitate the pupil’s attainment of 
worthwhile educational goals. Adverse attitudes, on the other 
hand, are likely to result in discord, apathy, rebellion, tru- 
ancy, and other behavior that interferes with the attainment 
of desirable educational objectives. 

Interests and attitudes are learned. Individuals develop at- 
tractions or aversions as a result of environment opportuni- 
ties, personal needs, and experiences. For example the indi 
vidual may develop a favorable attitude toward reading or an 
aversion to reading in accordance with the opportunities, sat- 
isfactions, failures, or frustrations with which reading b 
comes associated in his experience. If the pupil has developed 
positive attitudes, his energy can be readily directed tor 
reading experiences. If he has developed negative attitudes, h 
is likely to avoid reading situations. 

163 
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Parents and other individuals in the child’s immediate en- 
vironment influence the development of his interests and at- 
titudes. Hence, attitudes toward school and school experi- 
ences, racial and religious groups, teachers, and other chil- 
dren are frequently created before the child reaches school 
age. The teacher’s task of knowing the pupil and working 
with him is facilitated by adequate understanding of his at- 
titudinal and interest patterns. 

METHODS OF STUDYING INTERESTS 

The investigation of pupil interests may be carried out by 
means of observation techniques, interviews, direct questions, 
a check list, or an interest inventory. Studying interests by ob- 
servation offers certain advantages over the use of interview's 
or inventories, since it permits the teacher to study his pupils 
under conditions which are natural rather than artificial. The 
classroom affords many and varied opportunities to observe 
behavior; the method of observation can be adapted to many 
situations, and records can be kept over long periods of time. 

lanned and purposeful observation is likely to arouse the 
teacher's interest in and increase his understanding of pupil 
behavior. 

However, the teacher must be aware of the limitations of 
o .crvational methods. If the observations are carried on with 
reference to too many situations or too many pupils at one 
ttme, they may become extremely time-consuming. Probably 
it is wisest to begin by keeping relatively complete records on 
a few pupils who present motivational problems. When this 
procedure is used, the observations may well be used to sup- 
plement data derived from other, less time-consuming meth- 
ods of studying interests. 

Further limitations of observation as a method of studying 
the child are its subjectivity and the need for skill on the part 
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of the teacher. The attitudes of the teacher, the range of situ- 
ations in which he observes behavior, the significance he at- 
taches to specific incidents, and the degree of objectivity he 
attains in recording behavior ail influence the validity of the 
observations. 

Interviews offer possibilities for the study of interests, since 
pupils are ordinarily eager to discuss hobbies and other activ- 
ities which are of interest to them, when they find an adult 
who appears to be interested, understanding, and willing to 
listen attentively. Pupil interests form a good basis for begin- 
ning an interview which may actually have some purpose 
other than to investigate interests. The teacher may acquire 
information about feelings and attitudes as he encourages the 
pupil to talk about his after-school activities, his favorite 
games or play activities, his hobbies, trips he has taken, his 
most interesting experiences, the books he likes, his favorite 
radio or television programs, movies he has enjoyed, and so 
on. Such interviews ordinarily prove fruitful in developing 
friendly relationships and increasing understanding of pupil 
feelings and attitudes as well as locating interests. Interviews 
of this type help the teacher to plan experiences for pupils 
which will utilize their interests advantageously. 

In order to save time and gather data from the entire class- 
room group at one time, the teacher may wish to ask pupils to 
describe their preferred activities or to name their favorite 
school subjects, games, hobbies, reading material, or recrea- 
tional activities. Written reports of this type provide the 
teacher with a wealth of information which may be utilized to 
good advantage in the classroom. 

The more formal type of interest questionnaire may also 
help the teacher to know his pupils better. The questionnaire 
has the advantage of providing an economical method of 
gathering the desired data, but the method is subject to cer- 
tain limitations. For instance, the questions may or may not 



166 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 
be meaningful to the pupil in terms of his experience and in- 
formation; and he may or may not be willing to cooperate 
fully in indicating his real preferences. The questionnaire, 
however, may include a wide range of statements related to 
interest and hence may represent a broad coverage of inter- 
est possibilities. The teacher may develop questionnaires to 
serve his specific purposes, or he may use a suitable published 
questionnaire. Interest inventories have been developed to fa- 
cilitate the study of preferences among vocations, academic 
areas, extracurricular and recreational activities, and personal 
and social activities. The accompanying examples have been 
selected from a few published inventories to indicate some of 
the methods and instruments that have been devised for the 
study of interests. 

A relatively informal listing of seventy-four interests and 
activities accompanies the California Test of Personality . 1 2 3 4 5 6 
The directions and a few items will serve to indicate the gen- 
eral nature of the inventory. 


Interests and Activities. First look at each thing in this test. 
Make a circle around the “L” for each thing that you like or would 
like very much to do. Then make a circle around the “D” for 
things you really do. 

1. L D Play the radio 

2. L D Read stories 

3. L D Go to the movies 

4. L D Read comic strips 

5. L D Work problems 

6. L D Study history 

* * * 

70. L D Go to parties 

71. L D Go to dances 

72. L D Be an officer of a club 





STUDYING INTERESTS AND ATTITUDES 167 

73. L D Be a class officer 

74. L D Go camping 

The hems of the inventory are arranged according to the 
amount of activity involved, proceeding from the more indi- 
vidual and passive interests to those which are predominantly 
social and active in nature. 

Very few interest-test materials for elementary school pu- 
pils have been published. However, among the few published 
inventories of children’s interests is one entitled What I Like to 
Do,' which is designed for pupils in grades four through seven. 
The authors suggest that the inventory may be useful as an aid 
in (1) curriculum development, (2) selection of instructional 
materials, (3) parent conferences, (4) understanding of in- 
dividual differences among pupils, (5) planning for pupils in 
instructional, recreational, and educational areas, and (6) 
pupil guidance. 3 The interest areas covered are: art, music, 
social studies, active play, quiet play, manual arts, home arts, 
and science. Interest profiles provide percentile norms for 
boys and girls from grades four through six. Pupil responses 
are indicated by a cross in answer boxes under No, ?, or Yes 
for each item. The following are illustrative sample items: 

Would You Like to . . . 

No ? Yes 

1. Eat ice cream — . — . 

2. Play “Crack the Whip” — — — 

3. Walk in the woods — . 

4. Sleep in a tent . • — 

The Strong Vocational Interest Blank' is an example of a 

carefully standardized interest inventory. Separate forms are 

3 Louis P. Thorpe, Charles E. Meyers, and Marcella R. Sea, What 
1 Like to Do: An Inventory of Children's Interests, Chicago: Science 
Research Associates, Inc., 1954. 

’Thorpe et al., Examiner Manual for What 1 Like to Do: An In- 
ventory of Children’s Interests, p. 3. 

’ E. K. Strong, Jr., Vocational Interest Blank, Stanford, Calif.: 
Stanford University Press, 1938. 
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available for men and women. The individual is asked to in- 
dicate whether he likes, dislikes, or is indifferent to each of a 
list of occupations, amusements, school subjects, activities, 
and groups of persons. Included also are scales in which ac- 
tivities are ranked in order of preference and scales in which 
a comparison of interest between two items is requested. The 
inventory includes a self-rating of abilities and characteristics. 
The individual checks “L,” “I,” or “D” (like, indifferent, or 
dislike) for (1) occupations such as advertiser, architect, 
army officer, artist; (2) amusements such as golf, fishing, ten- 
nis; (3) school subjects such as algebra, agriculture, arith- 
metic, art; (4) activities such as repairing a clock, making a 
radio set, interviewing clients; (5) people such as progressive 
people, conservative people, energetic people, people who 
borrow things. Scores on the blanks indicate whether or not 
the subject has patterns of interests similar to those of persons 
who are engaged in given occupations. The Strong inventory 
as been found to be useful as one source of data in counsel- 
ing with high school and college students relative to academic 
and vocational choices. 

O ccu P°t‘ ona l Interest Inventory 5 represents a some- 
i erent approach to the study of occupational prefer- 
ences. The individual is asked to indicate his preferences 
among paired activities such as the following: 6 


A. Deliver groceries or meat to homes 

D. Wrap articles in the shipping department of a store. 

8 

D. Raise pedigreed dogs, horses, or other animals. 

C. Operate lathes, drill presses, or planes. 

AngcIcC: CaHfomia^T«t B^,’ ^944^°^ lnvent ory, Los 

'Ibid., Intermediate Inventory, Form A. 
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D. Direct the sales policies for a large store or firm. 

E. Write stories or articles for important magazines. 

Scores on the Occupational Interest Inventory are related 
to six fields of interest: personal-social, natural, mechanical, 
business, the arts, the sciences. Scores are also available for 
types of interest such as verbal, manipulative, and computa- 
tional. The last section of the inventory is designed to identify 
the level of the individual’s interest which may be associated 
with tasks at the routine level, the skilled levels, or a level 
which requires expertness, skill, judgment, and perhaps 
supervisory or administrative responsibilities. The test ap- 
pears in forms adapted to the upper elementary or junior 
high school age and to the high school, college, and adult 
levels. 

The Kuder Preference Record’ appears in two forms, vo- 
cational and personal, which differ in emphasis and purpose. 
The vocational inventory provides a profile of scores in ten 
interest categories: outdoor, mechanical, computational, sci- 
entific, persuasive, artistic, literary, musical, social-service, 
and clerical. The personal form of the Preference Record is 
similar in format to the vocational form. It provides scores 
for different types of personal and social activities such as 
working with ideas, being active in groups, avoiding conflicts, 
directing or influencing others, being in familiar and stable 
situations. 

The Kuder inventories utilize a forced choice technique in 
which the individual checks the best and least liked of three 
possibilities presented in each item. For example, the indi- 
vidual indicates the most and least preferred of the following:* 

1 G. F. Kuder, Kuder Preference Record, Chicago: Science Re- 
search Associates, Inc., 1948. 

1 Kuder Preference Record, Personal Form AH, Chicago: Science 
Research Associates, Inc., 1948. 
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a. Visit an art gallery 

b. Browse in a library' 

c. Visit a museum 


An inventory of a somewhat different type is the Dunlap 
Academic Preference Blank} This is a check list designed 
for use with pupils from grades 6 to 9. It consists of ninety 
words and phrases representative of eight academic areas of 
elementary schoolwork. Pupil responses indicate liking, dis- 
like, indifference, or absence of familiarity with the various 
areas. 

Interest-test scores have generally been found to possess a 
relatively high degree of reliability. Administration of the tests 
to seventeen-year-old students, to college students, and to 
adults has demonstrated that the scores have a considerable 
degree of stability. 10 However, the interest scores of high 
school students are not so stable as those of older individ- 
uals." 


The constructive use of interest inventories requires an ap- 
preciation of the limitations of the instruments. The teacher 
should bear in mind the following limitations: (1) Answers 
epend on the individual’s present status. Since interests grow 
out of experience, it is possible that future interests may de- 
velop in other directions. It is entirety possible that success 
’ " °™ actlv ! ty which the P“Pil fa required to pursue may 
® . ’ r .- an mtCri:St; 11 1S also Possible that such required 
participation, especially if , he student is not successful in the 

WoridB'J* ^m^wa adCm,C Pre ‘ erencc Yonkers, N.Y.: 

ford^if.f'Sord^Wenity^ress 1 ^ T 

***** - iSfT. 253 - 268 ^ ' 

of VocattooTlnlermts^f HiJr&hMrBoyj” D ; Caner ’ " Pcrraanence 

Psychology, 32 : 481 — 494 ; 1941 . y ’ Jo “ rni ’i o/ Educaliona! 
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activity, may inhibit the development of interest. (2) Pub- 
lished inventories do not necessarily include the whole range 
of possible interests. The score indicates simply that of all the 
interests represented on the inventory the subject is most in- 
terested in a given area, not that this area is necessarily his 
greatest interest. If another area had been represented, his 
highest score might be different. (3) The tests do not indicate 
potentiality. If a person has not yet engaged in a given ac- 
tivity, his responses simply indicate that at present he has not 
become interested. There is no indication that familiarity will 
not generate interest. 

The interest inventory does, however, provide an effective 
means of gathering data within short periods of time and 
serves as a tool which may be helpful to the teacher in a 
variety of ways. At the secondary school level, interest-test 
results provide useful data in educational and vocational 
guidance, where test results are best used in conjunction with 
interviews designed to assist the student to reach suitable de- 
cisions. For guidance purposes the results of interest tests 
should be used in conjunction with other information in reach- 
ing a decision. Basing academic and vocational advice solely 
on the results of questionnaires is hazardous. As a starting 
point for an interview, however, these instruments are com- 
mendable, and the results of such tests are useful at the ele- 
mentary and secondary school levels in curriculum and in- 
structional planning and in working with pupils who present 
behavioral or motivational problems. 

METHODS OF STUDYING ATTITUDES 

Attitudes are predispositions or tendencies to react in cer- 
tain characteristic ways toward objects, creatures, individuals, 
institutions, races, religions, or practices. Attitudes may be 
studied by means of observation, interviews, ratings, and 
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various types of attitude and opinion scales. The teacher will 
find the study of pupil attitudes rewarding because it yields in- 
creased understanding of pupils and assistance in the planning 
and conduct of the instructional program and the evaluation 
of educational outcomes in terms of program objectives. 

Observation. Observational methods may be utilized as a 
means of gathering behavioral data from which pupil attitudes 
may be inferred. However, the method is subject to definite 
limitations. Personal attitudes and biases are likely to influence 
teachers’ interpretations of behavior. For this reason it is ad- 
visable to record observed behavior as accurately and objec- 
tively as possible over a period of time before attempting 
interpretations. Since situational factors influence behavior, 
the record should include: ( 1 ) a reference to the specific situa- 
tion, (2) a description of the circumstances associated with 
the behavior, and (3) a factual statement of the behavior ob- 
served. Over a period of time the teacher may gather a series 
of records from which valid inferences regarding behavior and 
attitudes may be drawn. The following points are fundamental 
to the development of adequate observational (or anecdotal) 
records: 

1. Note the setting in which the behavior occurred, e.g., 
the classroom, the playground, the halls. 

2. Record the activity in progress, e.g., the class, extra- 

urricular activity, special program, or period between 
classes. 

3. Note special circumstances, e.g., the individuals in- 
volved, prior events which may have been influential, 
plans or directions, if any, which were operating at the 

4. Describe the behavior concisely and factually without 
interpretative terms such as ‘■bad," " mca n," "good.” 

5. Sample behavior over a period of time. 
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6. Interpret cautiously on the basis of the objective data 
which have been recorded. The behavior description 
and the interpretation are not combined. 

These requirements suggest that it is probably best to begin 
by selecting one or two pupils for intensive study rather than 
attempting to record the behavior of a considerable number. 

Attitude Scales. A number of attitude scales have been de- 
veloped, using, for the most part, one of two basic methods. 
One of these approaches, devised by Thurstone, involves the 
placement of statements upon a continuous scale from ex- 
tremely favorable to extremely unfavorable. Each item or step 
on the scale is assigned a carefully developed weighted-score 
value. The subject indicates the statements with which he 
agrees and disagrees, and a score is derived. Representative 
statements from the Thurstone scale for measuring attitudes 
toward communism are: 12 

A. Communism is the solution to our present economic prob- 
lems (9.1). 

B. Both the evils and benefits of communism are greatly ex- 
aggerated (5.4). 

C. Police are justified in shooting down Communists (0.3). 

Statement A presents a view highly favorable to communism. 
Statement C represents extreme dislike, whereas statement B 
is considered to reflect a relatively neutral attitude. The 
median scale value of the statements checked by the subject 
determine his attitude score on the scale. 

Utilizing the technique outlined above, Thurstone and 
others have devised scales for the measurement of attitudes 
toward war, the Negro, the Constitution, the law, freedom of 
speech, labor unions, the treatment of criminals, and so on. 

“ L. L. Thurstone, “Attitude toward Communism,’’ Scale No. 6. 
Form A, Chicago: University of Chicago Press. (Copyright, 1931, by 
the University of Chicago Press.) 
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Util izi ng a similar technique, Remitters and others” have 
developed general scales designed to measure attitudes toward 
any person, group, institution, or practice. Excerpts from “A 
Scale for Measuring Attitude toward Any Institution,” de- 
veloped by Ida B. Kelly and edited by Remmers, will indicate 
the nature of these scales: 

1. Is perfect in every way. 

5. Represents the best thought in modem life. 

9. Is a strong influence for right living. 

12. Is valuable in creating ideals. 

16. Aids the individual in wise use of leisure time. 

The subject is asked to check each statement with which he 
agrees. The results of the Remmers scales are comparable to 
those of the more specific scales of Thurstone. 14 

Among the many scales developed by Remmers and his as- 
sociates are scales which indicate attitudes toward: 

1. Any disciplinary procedure (V. R. Clause). 

2. Any elementary teacher (M. Amatora) . 

3. Any practice (H. W. Bues). 

4. Any school subject (E. B. Silance). 

5. Any proposed social action (D. M. Thomas). 

6. Any teacher (L. B. Hoshaw). 

7. Any vocation (H. E. Miller). 

Another procedure for the measurement of attitudes has 
been proposed by Likert. 15 In the Likert scales, each statement 
represents cither a favorable or an unfavorable attitude. 

H. H. Remmers and N. L. Gage, Educational Measurement and 
Evaluation, rev. cd., New York: Harper & Brothers, 1955, pp. 387- 
389. 

'* H. H. Remmers, “Generalized Attitude Scales: Studies in Social- 
psychological Measurements " in Studies in Higher Education, no. 26, 
Lafayette, Ind.: Purdue University, 1934, pp. 7-17. 

“R. Likert, “A Technique for the Measurement of Attitudes,” 
Archives of Psychology, 22 (140), 1932. 
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Strength of reaction toward each item is indicated along a 
scale running from strongly agree to strongly disagree. Favor- 
able attitudes are reflected in high scores and unfavorable at- 
titudes in low scores. Each item is carefully selected and 
tested. The procedures used in developing the Likert scales 
are not so time-consuming as those required for the Thurstone 
scales, yet the Likert scales appear to be equally reliable. 16 
The Likert method requires subjects to respond to all items 
of the scale and has some advantages in terms of possibilities 
of analysis of the results. 

The Scale of Social Distance developed by Bogardus 17 is 
an instrument designed to indicate attitudes toward persons of 
various nationalities and races. Seven degrees of closeness are 
represented in the statements of the subject concerning his 
willingness to admit members of a national or racial group to 
(1) close kinship by marriage, (2) his club, (3) his street as 
neighbors, (4) the same occupation as himself, or (5) citi- 
zenship in his country; (6) as visitors only to his country; or 
(7) to exclude them from his country. Although the scale was 
designed for the study of attitudes toward racial and national 
groups, the method is readily adaptable to the study of atti- 
tudes toward members of a variety of religious, social, politi- 
cal, and vocational groups. 

In a study of the development of attitudes toward the 
Negro, Horowitz 18 used pictures of Negro and white boys. 
Pupils from kindergarten through eighth grade were first asked 
to rank the pictures in order of preference. Next they were 
asked to use the pictures as a basis for the selection of com- 

11 R. Likert and others, “A Simple and Reliable Method of Scoring 
the Thurstone Attitude Scales," Journal of Social Psychology, 5:228- 
'238, 1934. 

“ E. S. Bogardus, “Measuring Social Distance,” Journal of Applied 
Sociology, 9:299-308, 1925. 

”E. L. Horowitz, “The Development of Attitude toward the 
Negro,” Archives of Psychology, 28 (194), 1936. 
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panions for various situations and activities. For example, 
children were selected as classmates, captain of the ball team, 
luncheon companions, party guests, members of the gang, 
neighbors, and so on. Pictures of social situations involving 
the two races were also presented to afford further opportuni- 
ties for expressions of attitudes. Horowitz found that prejudice 
appeared at an early age and that attitude development is rela- 
tively consistent for groups and for individuals. 

An interesting approach to attitude assessment appears in 
Minard’s study of racial attitudes . 19 Statements were written 
involving situations which would place individuals in close 
proximity to members of such racial or national groups as 
Filipinos, Chinese, Mexicans, and Negroes. The situation 
might center around neighborhood residence, team or club 
membership, and so on. 

Attitudes toward the self, classmates, home, school, or per- 
sons in authority are frequently indicated in pupil responses 
to personality inventories. For instance, an analysis of the re- 
sponses of a boy to selected items of the California Test of 
Personality 20 may suggest that he considers his classmates to 
be mean, willing to cheat, unreasonable, unfair, and willing 
to take advantage of him whenever possible. Clues of this 
nature provide a basis for understanding his behavior. 

Attitude scales have been criticized because of the absence 
of objective evidence of their validity; there is frequently no 
ready means of checking the individual’s verbal report. The 
subject may wish to conceal his real attitudes or may not be 
aware of them. However, responses to attitude scales and 
studies of attitudes, when critically and cautiously interpreted, 
provide the teacher with pertinent information concerning 


w R. D. Minard. ‘‘Race Attitudes of Iowa Children,” Studies in 
Character, 4 (2), University of Iowa, 1931. 

”'V. w. Clark, E. W. Ties*, and L. P.' Thorpe, California Test of 
Personality, Los Angeles, Calif.: California Test Bureau, 1942. 
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the prevailing attitudes of pupils toward school subjects, in 
dividuals, groups, and practices, and changes of attitude pro- 
duced by presentations, discussions, interviews, or other tech- 
niques. Teacher evaluation of attitudes is in actuality almost 
a necessity, since many of our most worthwhile educational 
goals are related to the development of pupils’ attitudes. We 
call these goals character development, citizenship, moral and 
ethical behavior, or social cooperation. 

APPLICATIONS IN THE CLASSROOM 

Interests and attitudes are perhaps generally thought of as 
sources of motivation for learning. However, motives, values, 
attitudes, interests, and ideals which are socially acceptable 
and personally satisfying are not only valuable as supports for 
academic learning but represent valid educational goals in 
themselves. In the evaluation of the educational growth of 
pupils, the development of interests and attitudes deserves 
careful consideration. 

A study of pupils’ interests by any of the techniques sug- 
gested may provide the teacher with information useful in: 

1. Understanding pupils. 

2. Discovering motivational possibilities. 

3. Relating teaching to pupils’ interests and experience. 

4. Studying and evaluating pupils' interest changes. 

5. Helping pupils to: («) become aware of their interests, 
(b) evaluate their interests, and (c) increase their un- 
derstanding of themselves. 

6. Stimulating thought and discussion among pupils con- 
cerning the implications of their interests. 

Investigations of attitudes provide the teacher with data 
which may be significant in n number of respects. Such data 
may enable the teacher to: 
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1. Attain an increased understanding of pupils. 

2. Attain deeper understanding of pupil behavior. 

3. Develop curricular, field, social, or civic experiences re- 
lated to major educational goals. 

4. Evaluate pupil behavior on a broader basis than that 
of subject-matter attainment. 

5. Study attitude change as the result of directed experi- 
ences. 

6. Assess the relative effectiveness of various teaching 
methods and techniques as a means of influencing pupil 
attitudes. 

Data derived from careful studies and evaluations of in- 
terests and attitudes will be of value in compiling cumulative 
records and in accurate reporting of educational attainments 
not represented in achievement-test results. For example, such 
characteristics as cooperation, self-control, self-confidence, 
tolerance, optimism, leadership, respect for the rights of 
others, respect for the contributions and ideas of others, and 
such attitudes as those toward civic affairs and authority form 
an essential part of the evaluation of pupil progress and at- 
tainment. A sincere attempt on the part of the teacher to de- 
velop adequate bases for judgment with respect to such char- 
acteristics as those listed above could be expected to improve 
the teacher's understanding of pupil behavior and his evalua- 
tion of pupil status and progress. 11 

SUMMARY 

Interests and attitudes are essential aspects of the emo- 
tional and behavioral life of the individual and are essential 
in motivation and learning. The assessment of interests and 

” For a list of educational objectives and suggested means for eval- 
uating them, see J. W. Wrightstone, “Measuring the Attainment of 
Newer Educational Objectives,” Sixteenth Yearbook of the Department 
of Elementary School Principals, Washington: National Education 
Association, 1937, pp. 493-501. 
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attitudes is an important aspect of the evaluation of pupil 
progress and attainment with respect to important educational 

^Interests can be studied by a variety of methods: observa- 
tion, direct questions, check lists, and interest inventories. A 
variety of instruments provide means of gathering data con- 
cerning a broad range of pupil preferences with regard to 
school subjects, activities, forms of recreation, o 

V ° Attitudes can be investigated by means of observations, 
anecdotal records, interviews, ratings, and attitude and opt - 
ion scales. Data concerning pupil attitudes may con 
the understanding of pupils, the planning and conduct of th 
instructional program, the evaluation of pupil 
and the development of adequate records and reporting prac 
tices. A number of scales are available for the study o ^- 
nificant attitudes. In his use and interpretationofthesescate; 
the teacher should consider the method utilized in 

Ve Stn“upH interests and attitudes may — 
significantly to the educational program with > refe rena e to 
struction, evaluation, planning, recording, and reporting. 


STUDY 


AND DISCUSSION EXERCISES 


1. In what ways is it valuable for the ‘“^munde, rstand 

techniques for the measurement o '"‘ er “ 5 specific ways in 

2. Select an interest inventory “ nd , s S ® m tl lL her 

which its results can be of value to the c assr closely related 

3. List a number of attitudes which you 

to the effectiveness ofdassroom lc: S' thc seller study 

attitudes in specific behavioral terms, n b 

this attitude among pupils in his cl ? s * r °°™ 0ull i nc t hc bases for 

4. Select a published interest »mc nnssibilitics of in- 

its development and u,i ' tec , ^ 1 '° bc j“bcd uZ tliis inventory, 
terpretation of scores which might be 
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5. Indicate as specifically as you can the contributions which 
teacher investigations of pupil interests and attitudes can make to 
(a) teacher understanding of pupil behavior, (6) the development 
and maintenance of cumulative records, (c) pupil-teacher con- 
ferences, id) parent-teacher conferences, and (e) reports of pupil 
progress. 

6. Observe pupils in social situations in the classroom and out 
of the classroom and write behavioral descriptions. Make a record 
of the social attitudes which appear to be represented in the be- 
havior observed. 


SUGGESTED ADDITIONAL READINGS 

Cronbach, L. J.: Essentials of Psychological Testing, New York: 
Harper & Brothers, 1949. 

Chapters 15 and 17 include descriptions of methods and instru- 
ments used in the assessment of interests and attitudes, with 
suggestions for the applications of results. Chapter 18 is con- 
cerned with observation as a method of studying behavior. 
Greene, E. B.: Measurements of Human Behavior, rev. ed., New 
York: The Odyssey Press, Inc., 1952. 

Chapters 20 and 21 are devoted to the measurement of inter- 
ests and attitudes and contain descriptions of instruments and 
discussions of methods of assessment. 

Iordan, A. M.: Measurement in Education, New York: McGraw- 
Hill Book Company, Inc., 1953. 

Chapters 16 and 17 present an account of interest and attitude 
measurement. Chapter 16 includes a list of published interest 
inventories. 

Remmers, H. H., and N. L. Gage: Educational Measurement and 
Evaluation, New York: Harper & Brothers, 1955, 

Chapter 13 presents a discussion of the nature, organization, 
an sigm ounce of attitudes, and chap. 14 contains a discus- 
sion of methods of studying attitudes and interests, 
buper, D. E.: Appraising Vocational Fitness, New York: Harper 
& Brothers, 1949. 

Chapters 16, 17, and 18 are devoted to a detailed description 
of the nature and measurement of interest; the emphasis is pri- 
manly vocational. 



CHAPTER ELEVEN 


Rating Techniques in Pupil 
Evaluation 


Some of the most important results of education cannot be 
evaluated by the usual paper-and-pencil tests: the acquisition 
of effective work habits and study skills, for example, and the 
development of acceptable social attitudes and behaviors. 
Good work habits, cooperativeness, industry, responsibility, 
and citizenship are commonly listed on report cards an 
cumulative records and are recognized as standard educationa 
objectives. Rating methods are among the possibilities o 
evaluating pupil progress toward these educational goals, 
chapter describes means of summarizing and recording teac e 
ratings and suggests ways of improving the methods w 
the teacher may use. 


PROBLEMS IN THE USE OF RATING METHODS 

As we have seen, tests are tools to provide data upon 
which to base estimates and judgments. A rating repre 
an estimate or judgment regarding a pupil characteristic, a 
°n the teacher’s observations of the pupil- Test resu ts i 
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in the nature of the items, certain definitions of intelligence, 
achievement, readiness, and so on, and these definitions form 
a point of reference for the interpretation of test results. On 
the other hand, a rating on citizenship may have no such 
definitive point of reference. A teacher or parent may well 
ask, “What does the rater mean by citizenship? On what kind 
of data is his rating based?” Ratings typically are highly sub- 
jective in nature. They tend to reflect the characteristics of the 
rater to almost as great an extent as they do those of the in- 
dividual being rated. 

The most common sources of error in ratings are inade- 
quate or inconsistent definition of traits, fixed patterns of 
rating, and halo effect. A further problem is lack of con- 
sistency between several ratings of the individual on the same 
trait. 

Perhaps the key problem in the interpretation of ratings is 
the definition of the rated characteristic. Suppose that teachers 
A and B are rating pupils on cooperation. The definition uti- 
lized by teacher A may involve a large measure of obedience 
or emphasize cooperation with the teacher. Teacher B may 
evaluate the same trait almost entirely on the basis of ability 
to work cooperatively with other pupils. Ratings of the same 
pupils by these two teachers would bear no necessary rela- 
tionship to one another. At the same time, a parent attempting 
to interpret the ratings might have in mind a definition of co- 
operation which differs markedly from those of the two teach- 
ers. nless traits are clearly defined, ratings may be mean- 
ingless. 

The fixed pattern is a common source of error in ratings. 
Some raters, for example, are inclined to be consistently over- 
generous in their judgments. This has been termed the gen- 
crosity error. A second, and smaller, group of raters demon- 
strate a consistent tendency to underrate, and still others are 
prone to rate almost everyone as average regardless of exist- 
ing differences. The resulting ratings reflect the characteristic 
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evaluative tendencies of the rater and may have little reference 
to the actual characteristics of the individuals being rated. Fig- 
ure 13 illustrates the possible effects of such rating patterns 
with respect to a hypothetical class of twenty students. Teacher 
A rates 45 per cent of these pupils as “excellent” or “su- 
perior”; he is relatively generous in his ratings. Teacher B is 
apparently unable to differentiate among a majority of the 
pupils, since he places 70 per cent in a single category in the 


trait: cooperativeness 


Scale 


Teacher A 
Teacher B 
Teacher C 


20 

5 

5 


Superior 

% 

Good 

% 

Fair 

% 

Poor 

% 

25 

40 

10 

5 

10 

70 

10 

5 

10 

30 

35 

20 


Fig. 13. Per cent of a hypo.hetical class of twenty pupils P’ aced 
by three teachers under each of five levels of a scale for rating co- 
operativeness. 

center of the scale. This is an instance of the “average” error. 
Teacher C rates 55 per cent of the group as “fair” or poor, 
illustrating the error of underrating. These ratings wou £ 
difficult to interpret apart from a knowledge of the raters an 

their characteristic rating patterns. 

The halo effect is a further common source of error in 
ratings. The teacher forms a general impression concerning 
the pupil, and his ratings of the pupil’s traits are as likely to 
be representative of this general impression as t ey are ° 
specific characteristic being rated. For example, ary may 
have a pleasing appearance and manner. Teacher ratings 
such traits as dependability, emotional stability, and coopera 
tiveness may be influenced favorably by the teac ler s genera 
impression of Mary rather than by her actual sta * us ' v! 
spect to the specific characteristic being evaluate . n a 
able general impressions may lead to equally unren istic 
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evaluations. The influence of the halo effect is illustrated in 
Figure 14. 

Inconsistencies frequently appear in repeated ratings of a 
characteristic. That is, two teachers rating a pupil on a given 


Influence of the Halo Effect 



characteristic may vary markedly in their evaluations. Again, 
a teacher may change his rating ot a given pupil even in a 
short space of time. This lack of consistency or reliability 
poses a problem with respect to the interpretation of ratings. 
By way of analogy, consider the situation which develops 
when two sets of comparable mental- or achievement-test 
scores give entirely different pictures of the same person. Is 
the difference a result of the type of test used? Can it be ex- 
plained in terms of differences in the two testing situations? 
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Was the subject in poor physical health or emotionally upset 
at one of these times? How can the differences be explained. 
Similarly, interpretation is difficult when ratings of the same 
person differ with respect to identical characteristics. Among 
the reasons for inconsistencies may be the characteristics o 
the rater, factors in the situation in which ratings are de- 
veloped, the nature of the characteristic being eval 1™ ated, Ah 
extent of opportunity to observe the individual and changes 
in the individual over the period of time between ratings. 
There are many possible explanations. 

Ratings based on casual or incidental impressions are na 
tably unreliable, but ratings based upon careful an sys e 
observations of well-defined characteristics can be qmt re 
liable. Again, certain characteristics are -ore reliably jvM 
uated by rating methods than others Ratings o Ra{ 

can be observed objectively are typica y mos re • . 

of general characteristics and traits which involve ^acho 
with others tend to be least reliable? For example t a ts such 
as cooperativeness and integrity are relatively difficult 
uate by rating methods. 

ORGANIZATION OF RATING SCALES 
The usual rating scale presents the rater with a set_of char- 
acteristics (such as “^j^rThese ’traits may or may 
“^fiS The t0 raL is ashed to * 

checking a point on a scale ■ ways in wh ich 

the trait. The list that follows indicates variou 
the levels or degrees may be indicated: 

1. By means of numbers: 1, trait: aIvvays , 

2. In terms of frequency of occurrence 
usually, seldom, never. 

■H. L. Hollingworth, M mman CaraCer, New York: Ap- 
plet on-Century -Crofts, Inc., 192 . 
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3. By qualitative terms: excellent, superior, good, fair, 
poor. 

4. By terms which refer to relative status, e.g., relative to 
others: outstanding, above average, average, below 
average, inferior. 

5. By descriptive terms which apply to each level or step, 
e.g.: 

a. Recognized as a leader; assumes leadership willingly. 

b. Accepts leadership when specifically requested to 
do so. 

c. Avoids leadership. 

6. By means of coded numbers or letters : 

1 or A represents excellent. 

2 or B represents above average. 

3 or C represents average. 

4 or D represents below average. 

5 or E represents inferior. 


Each of these types of organization of rating scales may be 
of value as a means of providing the teacher with a frame of 
re erence or a guide, provided the type of organization is 
suite to the trait being rated and the purposes which the 
rating is designed to serve. For example, Schedule A of the 
aggerty-Olson-Wickman Behavior Baling Schedules uti- 
““ an organization based on the frequency with which a 
t>pe of behavior occurs. Four levels of frequency are indicated 
or each of the fifteen traits listed. A weighted score or quan- 
! 6 ™ has been established for each rating. This quan- 

.native value is based on the seriousness and frequency of 
occurrence of the behavior among school children (Fig. 15).= 

Yonkers. N.Y.: World Book Company. 1930 °’ D,m,,ons ■ 
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Behavior 

Problem 


Has never 
occurred 


Frequency of occurrence 

Has . Occa- Fi 

occurred siona , qu 

onceor occur- oct 

twice but rP1 


Disinterest in 0 ^ 

school work 

Truancy 0 12 18 21 

Fig 15 Organization of Schedule A of the Haggerty-Olson IPicft 
man Behavior Rating Schedules, indicating basis in frequency o 
currence and showing weighted scores. 

Schedule B of the Haggerty-Olson-Wickman Scales com- 
bines descriptive categories and quantitative va u ' 
weighted scores of Schedule B have been assigne on 
of relationships between ratings on each of t it y- 
and the behavior tendencies listed under Schedu e . fc 

Is his attention sustained? 

I ! 1 ! , 

Continually Frequently Usually Wide ^ 

absorbed becomes present- awake all 

in himself abstracted minded an , n ^ 

f51 (4) (2) (U (3) 7 

Fig. 16. Organization of Schedule B of £ 
man Behavior Rating Schedules. Descriptive g 
panied by weighted score values. 

• *.* _ j-.f Qrhedulc B. These sched- 
16 illustrates the organization of . . t j tnt ive 

ules represent an attempt to assign ™™™ S the definition of 
values to rating categories; they also l Hescrin- 

scale steps in terms of frequency of occurrence and descr.p 


* Ibid. 
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IMPROVING RATINGS OF PUPIL 
CHARACTERISTICS 


In the use of rating devices there is no substitute for con- 
scientiousness, skill, and objectivity on the part of the rater. 
However, work with rating instruments indicates that evalua- 
tions based on these devices may be improved through the fol- 
lowing procedures: 

1 . Selecting carefully the traits which are to be evaluated. 

2. Defining the traits. 

3. Describing the traits. 


4. Establishing a basis for judgment. 

5. Establishing scale steps. 

6. Organizing the rating instrument. 

1. Selecting the traits. In selecting a list of traits to be 
evaluated by rating methods, the teacher should consider the 


purpose of the evaluation and the extent to which each trait is 
related to educational objectives. Carefully developed ratings 
o significant pupil characteristics provide essential informa- 
10 .? m eva luation of pupil status and progress. Such ratings 
wi increase the value of pupil records and reports and may 

P a ™ P ° rtant T °* e * n teac her’s instructional planning. 

raits se ected for use in a teacher-developed rating in- 
strument should be („) relatively few in number, (h) criti- 
cally related to the teacher's purposes in rating, (c) as clearly 
differentiated as possible, and W ) capable of clear and Fre- 
ese definition, preferably in terms of observable behavior . 

whVt, 7\ ,rai ' S - A n,ajorit y ° f «><= characteristics 
which the teacher evaluates hv mmur. , , 

1 , ies °y means of ratings may be de- 

fined n a number of different ways. Since teacher ratings may 
be a means of proving information to other persons, such 
as the pupil, parents, or other teachers and professional per- 
sons , ,s important that the rated characteristics be defined 
carefully, objectively, and specifically. Such definitions are 
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necessary to interpretation of the evaluation represented by 
the rating. Probably the ratings that lend themselves best to 
accurate interpretations are those that can be described in 
terms of observable behavior, for many traits (e.g., “coopera- 
tiveness”) are subject to a number of possible definitions de- 
pending on the individual who interprets the term. However 
the teacher can clarify his intention by means of a recorded 
definition such as the following: “ Cooperativeness ; The pupil s 
ability to work harmoniously with his classmates in classroom 
activities and projects.” 

Certain advantages are achieved through the use of this 
type of definition. The ratings refer to classroom activities 
which can be observed by the teacher. There is no implication 
that the rating applies to the pupil’s behavior in situations in 
which the teacher has limited opportunity to conduct system- 
atic observations. Again, the reference applies to coope 
tion with other pupils rather than cooperation with the teacher. 

The pupil traits listed on report cards and other rating in- 
struments are often inadequately defined on the report or 
schedule. In such instances it is usually advisa e or 
teacher to decide upon a clear definition and to recor it. 

3. Describing the traits. As we have seen, a clear definition 
of traits is helpful in rating. However, definitions are general 
descriptions or summary statements, and ratings are m 
likely to be reliable and meaningful when they are based on 
specific behaviors which serve as indicators of the c iar. 
istics being assessed. For example, having define co °P 
tiveness, the teacher lists pupil behavior which is re a 
the definition. The teacher’s worksheet might look 1 'e 1 . 

Cooperativeness: The pupil’s ability to work harmoniously with 
his classmates in classroom activities and projects. 

1 .^Participates actively in group planning, work, and discuss ion . 

2. Brings materials and ideas to class to share with others. 
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IMPROVING RATINGS OF PUPIL 
CHARACTERISTICS 


In the use of rating devices there is no substitute for con- 
scientiousness, skill, and objectivity on the part of the rater. 
However, work with rating instruments indicates that evalua- 
tions based on these devices may be improved through the fol- 
lowing procedures: 

1. Selecting carefully the traits which are to be evaluated. 

2. Defining the traits. 

3. Describing the traits. 


4. Establishing a basis for judgment. 

5. Establishing scale steps. 

6. Organizing the rating instrument. 

1. Selecting the traits. In selecting a list of traits to be 
evaluated by rating methods, the teacher should consider the 


purpose of the evaluation and the extent to which each trait is 
related to educational objectives. Carefully developed ratings 
of significant pupil characteristics provide essential informa- 
tion in the evaluation of pupil status and progress. Such ratings 
wi 1 increase the value of pupil records and reports and may 
p ay an important role in the teacher’s instructional planning. 

e traits selected for use in a teacher-developed rating in- 
strument should be (a) relatively few in number, (i) criti- 
cally related to the teacher's purposes in rating, (c) as clearly 
differentiated as possible, and (d) capable of clear and pre- 
cise definition, preferably in terms of observable behavior. . 

.f m " S l he ' rmlS ■ A ma j°rity of the characteristics 
which the teacher evaluates by means of ratings may be de- 
fined ,n a number of different ways. Since teacher ratings may 
be a means of providing information to other persons, such 
as the pupil, parents, or other teachers and professional per- 
sons it is important that the rated characteristics be defined 
carefully, objectively, and specifically. Such definitions are 



Behaviors Which Indicate That One Is “ Considerate of Others "* 


Teacher Pupil — 

“Average” means that the pupil exhibits the behavior indicated to 
about the same degree as the average pupil of his grade level. 


X 5 S 


< < S 


1. Shares materials willingly and properly 

2. Observes normal courtesies in personal re- 
lationships with others 

3. Participates in and makes positive contribu- 
tion to group activities 

4. Returns materials to proper places after use 
(Other behaviors; write in and rate) 


Behaviors Which Indicate That One Is "Not Considerate 
of Others"* 

Teacher — — I >U P 1 ' — 7T . ", 

“Average” means that the pupil exhibits the behavior indicated to 
about the same degree as the average pupil of his grade level. 


= O 

S e 


< < £ 


1. Interrupts others who are speaking 

2. Doing such things as cleaning fingernails, 
cleaning out purse, combing hair, etc., w 1 e 
other students are making reports 

3. Crowding ahead of others in lunch me, 
while coming into or leaving classroom 

4. Cutting across, shoving, or crowding in cor- 
ridors 

5. Loud and boisterous in corridors 


(Other behaviors; write in and rate) 


* Springfield, Missouri, 

Fig. 17. Examples of rating scales. 
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3. Listens to the ideas and experiences of others. 

4. Respects the opinions of others. 

5. Shares his opinions with members of the group. 

6. Abides by the decisions of the group. 

7. Works with class officers, committees, and group leaders. 

8. Does not needlessly disturb the work of others in the group. 

9. Carries his share of responsibility for the work of the group. 

Such a list of behavioral indicators provides a relatively tangi- 
ble basis for the observation and evaluation of the charac- 
teristic. 

4. Establishing the basis for judgment. Judgments with re- 
spect to any characteristic may be either absolute or relative. 
The statement, “This object is five feet high,” illustrates an 
absolute judgment. Linear and quantitative measures of 
length, height, weight, etc., are based on standard units which 
give them a common meaning. A rating scale involving ab- 
solute judgments asks the rater, in effect, such questions as 
“Is he cooperative? To what degree?” This is probably the 
type of rating scale most commonly used. However, one 
might ask whether all raters use the same standard or unit of 
measurement as a basis for their judgments. 

In utilizing relative methods, the rater decides whether the 
pupil is relatively more or less cooperative than others. The 
comparison may be limited to members of an age group, 
grade level, or classroom group. Interpretations based on 
ratings of this type specify or imply the limitation that the 
pupil is being compared with others of a group. Thus the rat- 
ings of pupils will be dispersed around the level of average or 
typical perfonnance for the specified group. The technique is 
roughly^ similar to that involved in the development of 
“norms”; that is, the pupil is rated in terms of his status with 
respect to his group rather than in terms of a “standard” or 
“expected” level of attainment. The rating scales presented in 
Figure 17, prepared for the Springfield, Missouri, Senior High 
School, are examples of relative scales. 



RATING TECHNIQUES IN PUPIL EVALUATION 
too few steps are included in the scale, it will not accurately 
reflect trait differentiations among individuals. Too many 
steps, on the other hand, may make the task of the rater cum- 
bersome or make differential judgments difficult. As with 
other evaluative devices, the purpose of rating scales is to dif- 
ferentiate among individuals in terms of specified characteris- 
tics. More refined and specific differentiations ordinarily rep- 
resent a more adequate basis for interpretation and evalua- 
tion. 

6. Organizing the rating instrument. Rating instruments 
are customarily organized according to one of four genera 
plans: (a) the check list, ( b ) the coded scale (using coded 
numbers or letters), (c) the graphic form, and (d) the de- 


scriptive scale. „ , 

The check list presents the rater with a list of character- 
istics or behaviors to be checked off if they appear to a PP ' y 
to the person being rated. The “Behavior-observation Record 
(Figure 18) is such an instrument, developed to help teacher 
understand the behavior of their pupils. 

The coded scale is commonly used in pupil report cards 
Typically, it employs numerals or letters which are described 
in one section of the card. Following each rated character- 
istic, the code number or letter (frequently a gra e 
dicated to represent pupil standing. The following excerpts 
from the “Primary Pupil Progress Report” of the Corvallis 
Public Schools (Figure 19) are illustrative of such orgamza- 
tion. 5 The brief statement of philosophy will perhaps indicate 
the basis for the letter and number codes used m connection 
with this particular pupil report card. The numerals opposite 
reading items represent the pupil’s academic status in e 
ject as interpreted under “Subject-matter Eva uation. 
ters opposite the citizenship items indicate the pupt s 
with respect to each of the listed traits. 

. * “Primary Pupil Progress Report." Corvallis Pubtie Schools, Corval- 
Hs, Ore. 
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A special type of relative scale is the “man-to-man” scale. 
In developing this type of instrument the teacher selects cer- 
tain pupils as “standards” for the various steps of each trait 
scale. One pupil is selected to represent each step of the 
scale, and others are rated by comparison with these “stand- 
ards.” This method of establishing a basis for judgment ap- 
pears to have definite possibilities for classroom use. 

5. Establishing scale steps. A further procedure in estab- 
lishing the basis for judgment is developing categories, levels, 
or steps which represent the scale for each trait. Ordinarily 
we think of any trait variable (such as energy, enthusiasm, or 
initiative) as essentially continuous. In practice, however, a 
number of areas or “units” are established along the trait scale 
as a matter of convenience rather than fact. For example, a 
scale for enthusiasm might be represented as follows: 
Enthusiasm: 

Completely Extremely 

indifferent enthusiastic 

The continuous line represents the range from, say, zero 
enthusiasm to the highest extreme of the trait. But it would 
be difficult accurately to rate pupils along a continuum of 
which only the extremes are described. The following ar- 
rangement is more useful because five specific steps are de- 
scribed along the continuum. 4 

Enthusiasm: 


Indifferent Rarely shows Sometimes Usually has Works with 
enthusiasm enthusiastic pep and vigor great 
enthusiasm 


The number of steps included in a scale will be determined, 
in part at least, by the purposes for which the ratings are to 
be used and by the nature of the trait. In general, however, if 


* Summary Behavior Rating Scale, 
High School, 1949 (mimeographed). 


Springfield (Missouri) Senior 



Corvallis teachers believe that each child’s progress should be 
reported to him and his parents at least three times each year. 
They also believe that each pupil should be evaluated in terms 
of his individual growth and progress and in terms of his achieve- 
ment in academic work. In order to do this dual evaluation task, 
two different sets of symbols and meanings are required. 

Individual Evaluation 
A — Pupil is using all his ability. 

B — Pupil is using nearly all his ability. 

C — Pupil is using about half of his ability. 
t> — Pupil is using less than half of his ability. 

E — Pupil is using almost none of his ability and is making very 
little individual progress. 

Subject-matter Evaluation 

1 — Pupil’s achievement and position in this subject are excellent. 

2 — Pupil's achievement and position in subject are above average. 

3— Pupil’s achievement and position in this subject are average. 

4 — Pupil’s achievement and position in subject are below average. 

5 — Pupil’s achievement and position in this subject do not meet 
the standards for this subject. 

First Second Third 
reading report report report 

Reads with understanding 2 

Reads well to others 3 

Shows ability to attack new words 2 

Enjoys stories and poetry 3 

effective citizenship 
Follows directions promptly. ... B 
Makes good use of free time. ... B 

Completes work A 

Takes care of property C 

Accepts criticism B 

Displays good sportsmanship. . . D 
Uses courtesy in manner and 

speech C 

Cooperates in classroom C 

Controls own freedom C 

Fio. 19. Primary pupil report card. 
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BEHAVIOR OBSERVATION RECORD 

To understand behavior, it is important to observe the pupil’s 
reactions on the playground, in the neighborhood, and at home. 
Check words or phrases that describe the behavior of the pupil as 
you have observed it. Please feel free to individualize the report 
as much as possible by adding descriptive comments. If you know 
of reasons for the conditions you check, please jot them down at 
the right of your answers. 

Is this pupil physically strong? 

Is strong and active Has ordinary endurance 

Seldom tires Is listless, easily fatigued 

Does he have good work habits? 


Completes what he starts Needs urging to stay with 

Is able to evaluate his a task 

wor ^ Is easily discouraged 

Capable of sustained at- Seldom completes the job 

tention Easily distracted 

Does he get along with other people? 


Is a successful leader 

Works and plays well 

with others 

Earns recognition 

Prefers to work by him- 
self 

Is destructive 

Has bad temper when 

thwarted 


■ Is quarrelsome 

Is overaggressive 

Is easily led 

Often lies to get out of 

difficulties 

• Is disobedient to teachers 

Has few friends 

Is disliked and avoided 

by others 


What is his usual disposition? 


— Cheerful, happy 

Kind and sympathetic 

Self-controlled, calm 

Quiet, reserved 


Impulsive 

Stubborn 

Moody 


Fig. 18. Behavior-observation Record, used in the 
Schools (San Diego, 1949). 


San Diego Public 
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Needs much prodding 

in doing ordinary as- 
signments 

No opportunity to 

observe 

The ordering of the steps constitutes a problem in setting up 
rating categories for each trait. In many scales, constant al- 
ternatives are used; that is, a set of steps is established which 
applies to all traits included in the scale. The levels repre- 
sented may be “Excellent, good, fair, poor”; “Always, usually, 
frequently, seldom, never”; “Outstanding, above average, 
average, below average, inferior”; and so on. Coded number 
or letter forms of organization typically utilize constant a - 

tematives. ., ... 

In general, ratings are likely to be more accurate if the steps 

of the scale for each trait are set down in random order. That 
is, the “good” and “poor” ends of the scale used m the graphic 
form may be alternated in random fashion. This procedure 
encourages the rater to examine each descriptive statemen 
and minimizes the tendency to check one or the other side of 
the rating sheet continually. 

USING RATING DEVICES IN SCHOOLS 

As we have seen, rating pupils is one of the teacher’s cus- 
tomary responsibilities, for ratings are required for report 
cards and school records. Techniques have been designed to 
improve ordinary rating devices such ^P° f educa ; ional 
these techniques may be utilized for 

'“ordinarily, the task of reporting pupil status or progr ess 
presents difficulties for the teacher. Shou P 

graded on the basis of “standards" or “expectation forte 
grade? On the basis of improvement or grow 
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In the graphic type of rating scale, checks are placed along 
a line. Steps in the graphic scale may be presented numer- 
ically, as coded letters or numbers, or as descriptive phrases. 
The following item is illustrative of the graphic organization 
utilizing descriptive-trait categories: 6 

3. Is his attention sustained? 


Distracted: Difficult to Attends 
Jumps rap- keep at a adequately 
idly from task until it 
one thing is completed 
to another 


Is absorbed Able to hold 
in what he attention 

does for long 

periods 


The graphic presentation permits relatively rapid assessment 
of the results of the rating. 

Descriptive rating scales may be organized in a variety of 
ways. The distinctive feature of this type of scale is that de- 
scriptions indicate the various scale steps. The item for rating 
attention” in the preceding paragraph combines the graphic 
and descriptive forms of presentation. The descriptive scale 
may also be organized as follows: 7 


B Does he need frequent Seeks and sets for 

prodding or does he go ahead himself additional 

without being told? tasks 

Completes suggested 

supplementary work 

Does ordinary assign- 
ments of his own 
accord 

Needs occasional 

prodding 

* Haggerty ct a!., ibid. 

' Ada „ plCd „ fr ° ra Tricon Council on Education Personality Report, 
Form B. Washmgton: Committee on Personality Traits, American 
Council on Education (mimeographed). 
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3 Rate one trait at a time. Whenever possible, it is advis- 
’ able to rate all pupils on one trait before going on to the 

next. . 

4. Examine ratings for indications of lack of distribution. 
Although frequencies should be greatest around the 
center of each trait scale, ratings should be dispersed 
over the length of the scale. 

5. Limit the number of traits which are to be considered in 
any one device. 

6. Rate a pupil only after adequate observation of the spe- 
cific characteristic which is being evaluated. 


SUMMARY 

Rating devices represent a convenient means of compiling 
data which provide a basis for the evaluation o pupi s. 
ings are typically subject to certain errors, ut cir 
tions can be minimized by 0) careful selection, defini, on, 
and description of the characteristics to be rate , 
lishment of a clear basis for making differential judgments 
and (3) organization of the scale to provide a meaning: 
dispersion of ratings for each trait. , , , . 

The traits selected for use in a rating instrumen s o 
relatively few in number and should be adapte ° P 
poses for which the ratings are to be used. ey s 
clearly defined and described, preferably in ,erms o£ ^ S 
able characteristics. The traits may be assessed on the basis o 
either absolute or relative judgments. Rating ms rumen i , are 
ordinarily organized in the form of (1) a c ec • 
coded number or letter device, (3) a graphic scale, or (4) 

*££££ r-p- ; 

classroom. Grading systems, report cards, ^andcu ^ 
ord forms involve the rating process. 
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parison with others of his grade? These questions concern the 
main problem of the establishment of a basis for judgment. 
Where established procedures exist in schools, evaluation may 
be improved through clear statement of purpose, definitions 
of traits, and clarification of the basis for judgment. Presenta- 
tions by teacher committees and discussions in staff meetings 
provide a means toward the development of common under- 
standings essential to meaningful evaluations. 

Report forms, of course, must be interpreted by parents. It 
is therefore advisable that the traits evaluated on reports be 
clearly defined as to the meaning and significance of ratings. 
Printed statements and discussions related to the develop- 
ment and use of report cards are often helpful in increasing 
parent understanding of the ratings. 

The teacher may utilize rating devices for a variety of use- 
ful purposes in the classroom. A few of the possible areas of 
usefulness are: 

1. Study of the work habits and skills of pupils. 

2. Study of pupil behavior in specified group activities 
(such as games, field trips, committee work). 

3. Evaluation of performance or products (as in handwrit- 
ing, art work, speech, shop work, oral reading) . 

4. Pupil self-evaluation with respect to specified traits, ac- 
tivities, and interests (such as cooperation on a field 

trip, work habits, study skills, contributions to the 
class). 

The following suggestions are designed to serve as a guide 
to the teacher in the development and use of rating devices. 

1. In developing the device (scale or check list), relate it 
to educational objectives. 

2. State clearly the behaviors which are to be observed and 
rated. 
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Guilford, J. P.: Psychometric Methods, New York: McGraw-Hill 
Book Company, Inc., 1936. 

Chapter 9 presents an excellent account of rating methods. Spe- 
cific limitations and advantages of various types are mchcated. 
Jordan, A. M.: Measurement in Education, New Yor 
Hill Book Company, Inc., 1953. , . „ 

Chapter 18, “Measurement of Personality Traits, me 
discussion of rating scales. Illustrative materials areincluded 
Micheels W J., and M. Ray Karnes: Measuring Educational 
Si New York: McGraw-Hill Book Company, Inc., 

^Chapter 13 is concerned with observational techniques in rela- 
tion to evaluation. Guiding principles are presented for usi g 

the results of observations. , 

Remmers, H. H, and N. L. Gage: Educational Measurement and 
Evaluation, rev. ed„ New York: Harper & Brothers, 1955^ 
Chapter 12 includes a concise discussion of ra mg 
ods. Suggestions for the development of graphic scales are pre 

Thomas'! R. M, Judging Student Progress, New York: Longmans, 

Green & Co., Inc., 1954. , t n r rqt : n( T 

Chapter 11 presents a relatively nontechnical aecoun of ratmj 
scales and check lists. The discussion centers around school use 

of the instruments. , r 7 ., n u, ni : nn ;« 

Thorndike, R. L„ and E. Hagen: Measurement^ and 
Psychology and Education, New York: John Wi y ’ 

^Chapter 13 presents a relatively comprehensive amount of rat- 
ing methods. Suggestions for the improvement of ratings 
eluded. 



200 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 
rating instruments to study and evaluate a wide range of pupil 
behaviors and attitudes, and to the extent that he improves his 
ability to develop and utilize instruments of this type, his pro- 
gram of evaluation will be less tied to those educational ob- 
jectives which are more readily assessed by means of the usual 
paper-and-pencil tests. 


STUDY AND DISCUSSION EXERCISES 


!• List educational objectives important in your teaching which 
cannot be measured by paper-and-pencil tests. 

2. Discuss the merits and limitations of absolute and relative 
measures as they apply to rating instruments. 

3. What specific values do you see in pupil self-evaluation by 
rating devices? In what ways might self-rating scales be useful in 
your classroom? 


4. How would you develop a pupil self-rating scale to stimulate 
interest in neatness in written work? 

5. Develop a rating device to assist in the evaluation of the 
products or procedures of pupil work in any one of the following 
areas: shop, English, art, science, handwriting. 

. (a) Select a subject area. Develop a definition and behavior 
scription of study skills or work habits in that area. ( b ) Or- 
* nstrument based on your definition and descrip- 

, 11 ° e rait ' ^ ^ resent reasons for your selection of a par- 
ticular type of scale organization. 


SUGGESTED ADDITIONAL READINGS 

h"’b tL ES m9 ab ol Psyd '™ Testin *- York: 

l^nf 18 * 0 ' tMS “T ehCnSiVe ,ext is 3 d ™ussion of tech- 
2 L L T be* 13 ™'- in normal situations. The values 
and limitations of rating methods are considered. 

nil Me —r o1 Human rev. ed„ New 

York. The Odyssey Press, Inc., 1952. 

SS 16, “ TyP f S Estimates ” includes a relatively com- 
prehensive account of rating methods and devices. 
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considerable difficulty may be encountered by several pupils 
for whom success was indicated by the test results. The dif- 
ference may result from the fact that the reading materials 
used in the second school were more difficult than those used 


in the first. 

Another example concerns achievement testing. In one 
school, pupils in the fourth and fifth grades show up year 
after year as substantially below the norm in arithmetic, 
though they do the work normal for their grade and age in 
other areas. Pupils who are above average in ability do above- 
average work in other subjects than arithmetic. In one such 
system the principal planned to bring in special help for the 
teachers because of their indicated need for guidance in teach 
ing arithmetic. The explanation was discovered to be the fact 
that in this locality it had been previously decided that arith- 
metic instruction could profitably be delayed until the fourth 
grade rather than offered in the third. The disadvantage of 
the delay does not disappear until two or three years atcr. 
By the time a group reaches the seventh grade, more of the 
pupils will be happy and successful in their work m ant - 
metic if they started studying it in the fourth gra e t inn 1 

they had started it in the third. 

A third example of the influence of local norms was en- 
countered in a school system where the formal study o * ner 
ican history was subordinated to the study of local pro ems 
as an approach to history. The pupils did not do we on 
standardized tests in which there was considerable emphasis 

on American history. # , . . 

Different schools in the same community may n 1 
sirable to interpret norms quite differently. A sc oo w 1C 
draws its pupils exclusively from a neighborhoo comp 


■ W. A. Brownell and C. B. Chazel, “The Effects of 
°n Third-grade Arithmetic,” Journal of Educationa 
28, 1935. 



CHAPTER TWELVE 


Constructing and Using Teacher- 
made Tests 


Standardized tests produced by specialists have an important 
part to play in education when they are used with proper re- 
gard for their advantages and limitations. Some of these 
limitations can be avoided by using teacher-made tests. Tests 
prepared by the teacher compensate for some of the weak- 
nesses inherent in standardized tests, but they are in turn 
subject to certain shortcomings. They are not a panacea for 
problems of evaluation, but they do serve important purposes. 

are u test construction and interpretation can increase their 
usefulness. 


THE NEED FOR TEACHER-MADE TESTS 

As we have seen, standardized tests do not always fit local 
situations. For example, in one school a test of reading readi- 
ness is given to the entering first graders, and on the basis of 
the results certain pupils are started on the reading program. 
Gratifying success may be achieved by all these starters How- 
ever, when the same procedure is followed in another school, 
202 
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progress toward a goal and to challenge one’s sense of achieve- 
ment is a helpful educational instrument. Children can, and 
do in favorable circumstances, enjoy taking tests. When chil- 
dren fear tests, it is because of the emphasis placed on results. 

Although much is heard these days about the desirability of 
making motivation intrinsic, or making the task interesting in 
itself, it is more exact to observe that interests grow, develop, 
and evolve. Interests are much more than discoveries of some- 
thing innate; they often develop as the result of the student s 
originally being “forced” to engage in a given area of experi- 
ence. Interests grow as the result of knowledge and the de- 
velopment of competence, of success, and familiarity. Hence, 
an examination or series of examinations may serve as 
original motivation for the pupil to check his know e ge an 
progress and to gain success and familiarity. The teac er m 
examination, given at shorter intervals than the standardized 
examination, can supplement other continuous experiences, 
can easily be designed specifically as an additiona source o 


motivation. . 

Teacher-made examinations can also serve as an appro, 
to diagnosis; that is, the test can be so designed that the scores 
pupils make will reveal areas in which they are wea . e 
nesses in number combinations, for example, or m ae 
arithmetical processes can be detected from the resu 
test which is so constructed that certain of the ques ions 
with specific skills or areas of knowledge. * 

Teacher-made examinations are probably customa y 
to help evaluate pupil achievement. As we have seen, 
not an easy task. However, with study and care, 1 P 
to secure approximate and tentative data w ic 1 v 1 
value in determining pupil progress. In or er to o 
information, the teacher-made test should be mo 
the standard examination by seeking to improve 
of objectivity, reliability, and validity of the test. 
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of professional people and business owners may be unjustifi- 
ably proud of the record made by its pupils. Teachers in a 
school whose pupils come mainly from lower socioeconomic 
strata and less stimulating environments may be discouraged 
because of low standing on national norms when, in fact, they 
might well be proud of their record in “pupil adjustment.” 

Another shortcoming of standardized tests is that they are 
not designed to explore and analyze small units of subject 
matter. Thus, the teacher may wish to give a test covering a 
half semester’s work or a unit on “Community Health Prac- 
tices. Tests can be of assistance in the study of these smaller 
units, but the standardized test is not likely to help because 
of its comprehensive and general nature. 

Local variations in curricular practices, the nature of the 
pupil population, and the division of work into smaller units 
may make it impractical to use standardized tests as the sole 
measuring device. In such situations the teacher-made test 
can make a valuable contribution to better pupil under- 
standing. 2 r r 


Uses °f Teacher-made Tests 

t °f va * ues of teacher-made tests is that they com- 
pensate for the shortcomings of standardized tests. Thus, as 

fhWnT Z’ te f her ' made ,ests can be better adapted to 

1Z P T CUrricuIar situa tions and are useful in ex- 

can be madZ yZmE SmaU U " itS ° f stud y- In addition, they 
can oe made to serve sc o . 

of weaknesses. motivation and diagnosis 

ment a o C t h herT d d ‘T ““ ^ USed to su PPlement and comple- 

Tted ftat h” b°i m0tIVation - 11 h “ previously been indi- 
cated that it is bad nractiee tn 

‘W r,f A .. ?, e 10 Consider a test result as an 
end of education. But the test which is used to indicate 

HUniook CoZnyZTTS^pp" 4 ^“'""' New Yo *: McGraw- 
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ful to the pupil. This type of question is probably best for test- 
ing in such areas as arithmetic, knowledge of historical per- 
sonages, geographical locations, and dates. It will be foun 
to be inadequate for testing a knowledge of social trends, 
functions of the organs of the body, foods required m var- 
ious diets, or commercial and agricultural products of nations 


or states. 

The true-false, or right-wrong, type of test item seems easy 
to construct but is actually so difficult to design that it has 
relatively little usefulness. Experts in test making rarely use it. 
True-false questions tend to place a premium upon verbatim 
learnings; since few things are so clearly right or wrong, 
swers are often quite debatable, much to the chagrin of th 
teacher who made the test. Further, this type of question 
to penalize the brighter student, because it is c w 
frequently thinks of the exception or conditiona ac o 
can alter the meaning. Let us examine the item, 
the United States are made of pot metal, copper nickel an 
silver.” The statement is true in a sense, but go mi„ 
have been included; thus it is false because it is not inclusive 
enough. If the statement were changed to Coins o 
United States are made only of pot metal, copper, . me e , 
silver,” the answer is still debatable. There are g° ^ 
in existence, but one could argue that they are not being made 
now. The limited number of possible alternatives increases 
the possibility of successful guessing and thus reduces 
diagnostic value of the test. trllP 

Since test makers show a tendency to ma 'e I " or ® 
than false, the student may systematically mar a 
that he does not know as true and be gratified with the rest • 

In order to avoid this, penalties aresomet.mes^imposedfor 

guessing — the total score being obtained by a rig ^ 

wrong” formula. This practice can be criticize o ^ — 
that complicating the scoring of an inadequ 
ntake it more valid and reliable. 
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APPLYING THE CRITERIA OF A “GOOD” TEST 

The criteria which apply to standardized tests are to a large 
extent applicable to teacher-made tests. Objectivity is desir- 
able; hence, it is recommended that tests be of the short- 
answer type in so far as possible. These would include true- 
false, multiple-choice, completion, and matching questions. 

Examinations of the so-called essay type are too difficult 
to score objectively to warrant a great deal of consideration. 
The contention that essay examinations teach pupils to or- 
ganize their thoughts can be disposed of with the argument 
t at an examination, with its accompanying pressure, is not 
a situation that is particularly conducive to the stimulation of 
ogical thinking. If organization of thought is the major ob- 
1 ctivc, it might be better to offer this training in special papers 
or themes. The teacher’s evaluation of the essay might be 
re accurate when it is a special paper than when it is part 
of an examination which must be given a grade. 

which'™ ' * ° f S f K>rt ‘ answer question is the completion item, 
in Tbla„?r S PUP “ ‘° fiH in 3 Word or o£ words 

tence whichV* ” 3 Sentence or P ara graph; the part of the sen- 
na Zd or ; 1PPear gives ,he con, “‘ into which the miss- 
mg word or words will fit pvom^i 

our common United States r are: Metals from which 

per an ™ otates^coms are made are pot metal, cop- 

cen’ts. In addition’hTspem 1UnCh COSt 25 

AH too^requentty'thereTmot to" difflCU " *° formllla,e - 

propriate for a particular blank Aft " tn” 6 W ° rd ^ * 3P ' 
one or two exceptions to what h3S ™ ade 

should be, it is difficult to determine h g n a amWer 

should be permitted. Completion questions h! T 

because they call for factual knowledge ! f n cn,lclzed 

be no objection to the learning of fams—if the ^ ' here Sh ° U ' d 
s w jdcis — if they are meaning- 
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Matching questions arc time-consuming for the student, since 
he has to search for the relationships; and for the elementary 
pupil they may be confusing as well as time-consuming. Hence 
relatively few items should be grouped together. The columns 
should be of different lengths so that thinking will replace 
guessing at some of the more difficult items. Primary-grade 
teachers have found that it is easier for pupils to understand 
the directions if they arc told to use a line to connect the two 
related statements. This makes scoring somewhat harder, but 
the advantage in pupil understanding may compensate. A com- 
bination matching-completion question can be made by pro- 
viding a group of words or phrases from which the pupil can 
select to fill in the missing parts of a sentence or paragraph. 

The user of standardized tests will note that the most fre- 
quently used type of test question is the multiple-choice item. 
This question is commonly found in the ‘test yourself fea 
tures in magazines and newspapers. It possesses several ad- 
vantages: the number of alternate responses (3, 4, or 5) re 
duces the chances of guessing more than is the case with the 
true-false or matching type of question; the listing of plausible 
answers stimulates thinking; the limitation (as compared wit 
the completion item) of possible answers eliminates ambi- 
guity in scoring; and the technique of scoring is not comp l 
cated. Multiple-choice questions are good teaching devices 
because discussion of the alternatives and analysis of the stu- 
dent’s errors after the examination provides the opportunity 
for careful explanation. 

Some of the advantages of the multiple-choice item 
counterbalanced by the difficulty of making the questions, 
however. It takes considerable time to construct fifty or a 
hundred items of this type— certainly much more tune than 
it normally takes to construct ten essay questions, n 
other hand, the time is compensated for by increased objec- 
tivity and ease of scoring. 
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However, the true-false item is frequently used in the class- 
room, although it should not be relied upon to too great an 
extent. Measures can be taken to increase its usefulness, how- 
ever. 


1. All items should be brief and without conditional factors. 

2. The use of such words as always, never, entirely, and 
absolutely should be avoided. 

3. The true-false item is more useful in language studies 
and mathematics than it is in social studies and general 
science. 

4. Statements should not be lifted from the textbook 
verbatim or with only minor revisions. 

5. Items should not be arranged in a regular pattern, such 
as T, F, F, T, F, T, T, F, etc., or T, F, T, F, etc. 


In general, it seems wise to recommend that true-false 
questions be cautiously used except for purposes of review 
and drill. Their use for evaluative or diagnostic purposes is 
highly questionable. 

The matching question has been found to be quite prac- 
ticable for classroom use. Two lists are set off or distinguished 
as pairs, as in the following example: 


nro P v a H C V he ’tu" °‘ ^ item in the ^t-hand column in the space 
provded n the numbered (left-hand) column with whieh it is 
most closely associated: 

a. helps put oxygen into the blood 

b. place where food is mechanically 
and chemically reduced 

c. carries blood to the heart 

d. muscle which pushes blood 

e. carries blood to the extremities 

f. muscles used in digesting food 

g. controls oxygen metabolism 

h. muscles used in locomotion 

i. muscles used in breathing 


-1. heart 
_2. lungs 
_3. thyroid 
_4. arteries 
—5. striated muscles 
_6. stomach 
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the pupils to refrain from marking the question sheet. Defec- 
tive items on the reused question sheet can be ruled out and 
a substitute item can be written on the board or mimeo- 
graphed on a separate page to replace the deleted item. 

The multiple-choice item satisfies to a large extent the 
criteria of a good test: it is objective, reliable, and economical 
of the teacher’s time; it samples widely and can be so planned 
that it has a significant degree of validity. Techniques for se- 
curing this validity will be discussed in the following section. 


TECHNIQUES OF TEST CONSTRUCTION 

In constructing tests, it is important first of all to determine 
just exactly what should and will be tested, since the purpose 
of tests is to help determine the extent to which educationa 
objectives are being achieved, the test should be devise in 
terms of the specific objectives of teaching a particular unit 
study. The teacher who prepares lesson plans will have one 
this much earlier. For those who do not write lesson plans, it 
would still be desirable to state the objectives that will serve as 
a guide to the construction of the items that really test w a 
one has been teaching. This is clearly a long step towar 
ing a valid test — a test that actually measures what it purpo 
to measure. Comparing each item with the final o jec 1 e 
the test will not assure validity, but it will probably men 
The goals or objectives of a unit must be speci c in 
to serve as a guide in making a valid examination, or e 
ample, such specificity is found in the following goa s or 
student in a unit in seventh-grade social studies. 

1. Reads news of general (first-page) interest in the 

2. Listens to the radio for purposes of gaining information 


3. 

4. 


her, farm reports, news). 

late the importance of some current news events. 
)me opinions on contemporary events. 
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It is recommended that the making of good multiple-choice 
items be a continuing project for teachers. This can be done 
— at a saving of time over a period of years — by making a set 
of 3-by-5 test cards for the various areas (social studies, 
health, science) with which the teacher deals. Each card will 
contain one multiple-choice item. A notation on the card in- 
dicates the phase of the subject with which it deals (history, 
“Pilgrims”). When it is time to make the test, the items that 
are most pertinent to the particular manner in which the unit 
was studied during the term are selected to be reproduced. 
After the test and the discussion of the items, an item analysis 
will reveal that some are of questionable value. A tally is kept 
on the effectiveness of each question. Some will be correctly 
answered by all; too many of these items will indicate that the 
test is too simple. If one item is missed by all, it is probably 
too difficult or is ambiguously stated. Poor questions are either 
revised or discarded. The next time the same area is to be 
covered by a test, a few new items are added to the revised 
set of cards to cover current emphases. By keeping separate 
the cards dealing with subdivisions of the total area, the 
teacher can easily make the test contribute to diagnostic pur- 
poses. 

Although the questions are discussed after the test, the stu- 
dent does not keep his test. To permit him to do so might lead 
some of the more sophisticated pupils to get the exam and 
cram for the specific questions on it rather than to study 
widely. Just as important, however, is the fact that the teacher 
cannot afford the time to make a carefully constructed new 
set of multiple-choice questions every time the area is covered; 
besides, there wouM be a loss in terms of the experience 
gained. Economy of the teacher’s time can also be achieved 
by providing a series of spaces on the left or right side of the 
paper in which to place the number or letter of the chosen 
response. After the test items have been checked through use, 
it may be advisable to have a separate answer sheet and ask 
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sponse to the problem stated above, each of the following 
has some plausibility: “(1) examine the water supply, (2) 
inoculate the citizens of the community, (3) examine the 
milk supply, (4) screen all windows, (5) spray all refuse 
piles and garbage cans with DDT.” 

The wording of the questions should be appropriate to the 
grade level concerned: if the items are too easy, the difficulty 
should not be increased by the introduction of more difficult 
words uniess vocabulary development is the goal. No answer 
should depend upon knowledge of the answer to another ques- 
tion in the same examination. Conversely, the information 
given in stating an item should not provide a lead to answer- 
ing another question. The statement and the alternatives 
should be as simple as possible — the correct answering of the 
question should not depend on the pupil’s ability to interpret 
a difficult statement. The answer which is supposed to be 
correct should be unquestionably correct; that is, the various 
books available to pupils should agree on the point concerned. 
The teacher should never have to resort to saying, “In our 
book the answer is ... ” Alternative answers should cite 
commonly held erroneous views as a means of sharpening the 
Phil’s perception of unjustified beliefs; for example, “A uni- 
versal characteristic of adolescents is (1) they are physically 
awkward, (2) they resist school authority, etc. Whenever 
Possible, test for knowledge of principles and generalizations 
^ contrasted to isolated facts. Tests of memory show that 
acts are forgotten more quickly than principles and generali 
^tions, which have greater significance than facts for solving 

Problems later. 

'These observations may make the task of test making look 
Ridable. Actually, practice and guided experience reduce 
t ^dini culty . Soon the teacher gives almost automatic heed 

‘he suggestions cited above, and usable test items occur to 
^ readily as the study of a unit progresses. A pair of tcac i- 

v ’° r king together can be of great help to one another. 
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5. Knows the names and positions of persons in the head- 
lines. 

6. Knows the geographical location of places in the news. 

7. Knows what sections of the paper contain certain kinds 
of news. 

8. Is acquainted with several features in the local news- 
paper. 

Such a list of specific aims can readily be translated into 
test items. The teacher can determine the number of questions 
which should be allotted to each goal by analyzing its relative 
importance and reflecting on the amount of time spent on the 
particular topic in class. Robert M. W. Travers recommends 
that the teacher keep a “blueprint” of the class as a guide in 
making a valid examination. 3 To do this, the teacher lines off 
a sheet of paper in blocks and labels the horizontal blocks 
with the educational goals for the topical subdivisions of the 
course, represented by the vertical blocks. Thus, under the 
heading of the educational goal “ability to spend money 
wisely,” reading across to the vertical column under the head- 
ing of “budget,” the teacher writes descriptions of the activi- 
ties by means of which one reaches the goal. When it is time 
to prepare a test, the entries in the boxes give clues to suitable 
items. 

The following criteria will be helpful in making multiple- 
choice questions: The key proposition should be stated in the 
form of a problem; for example, “The first thing to do on 
learning of a case of scarlet fever in the community is to ( 1 ) 

. . . This type of presentation is important even in testing 
for facts, because the ultimate goal is that pupils will use 
the facts to solve problems. The alternative responses should 
be as plausible as possible; unless they have some plausibility, 
the choice will be so easy that no problem is involved. In re- 

‘ Robert M. W. Travers, How to Make Achievement Tests, New 
York: The Odyssey Press, Inc., 1950, pp. 25-29. 
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Health Education 
11-12-57 

Pupil’s Name 

Place the number of the response which you select as correct in 
the space provided to the left of the number of the question. The 
first one is answered correctly. 

_ ^ 1 . The use of beverage alcohol is condemned because 

(1 ) it speeds up heart action 

(2) it causes diseases of the liver 

(3) it reduces physical and mental efficiency 

(4) it slowly disintegrates the brain 

(5) it hardens the arteries 

2. Milk should be in the diet of most persons, adults and 

children, because 

( 1 ) it is the food Nature planned for us 

(2) it contains so many ingredients that it rounds out 
the diet 

(3) it is essential to growing sound teeth 

(4) it is a clean, safe food 

(5) it is inexpensive 

It can readily be seen that a multiple-choice examination 
takes several pages of mimeographing; for this reason, an 
because pages must be turned for scoring, it is quite time- 
consuming. A separate answer sheet with a number of blan 's 
on it helps to offset this disadvantage. 


Pupil’s Name . 


Subject 


Place all of your answers on this answer sheet. Do not writ 
the question sheets. 


1. 


51. 

76. 



2. 

3. _____ 

ZD. 

27. 

52. 

53. 

77. 

78. 

— 

4. 

28. 

29. 

54. 

79. 

— 

5. __ 

30. 

55. 

80. 

etc. 
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even though the task of making valid and objective tests is 
an arduous one, it will pay in the increased effectiveness of 
the teacher’s testing program. Testing will serve the purpose 
for which it is designed: to facilitate the reaching of one’s edu- 
cational objectives. 


Some Sample Setups 

Careful attention to “setup” will help to make tests under- 
standable and economical. For example, questions can be filed 
on 3-by-5 cards such as the one shown here. 


health physiology 

The use of beverage alcohol is condemned because 

(1) it speeds up heart action 

(2) it causes diseases of the liver 

(3) it reduces physical and mental efficiency 

(4) it slowly disintegrates the brain 

(5) it hardens the arteries 

Missed by out of taking the test. 

Date used: 

Pupil comments: 


The notation in the upper left-hand corner indicates the broad 
area in which the question is used, and the note at the upper 
right gives the particular subdivision. The other notes can be 
reduced to 12/37, which means that the question was missed 
by 12 of 37 pupils; the date can be simply 11/12/57, and 
comments may be placed on the reverse side of the card. 

After the cards have been prepared, those the teacher se- 
lects are placed on the test paper with appropriate headings: 
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The scoring sheet for this type of answer sheet consists of a 
piece of stiff paper or cardboard with holes punched so that 
only the correct responses show. 



o 

© 

© 



© 

o 



© 

® 

o 

® 

o 

o 

/■ 

® 

o 

o 


In using this kind of scoring device it is necessary first to 
scan the papers for double answers. (This is necessary also 
when papers are machine scored, so there is no relative dis 
advantage in this respect.) Each number that appears clear 
under the punch hole and thus has not been blocked out by 
the pupil is an incorrect answer; hence all one has to do is 
count these for the minus score. If an item analysis is to c 
made, it will be necessary to cross out the number with a co - 
ored pencil as one counts the incorrect scores. 

The danger in this system is that, since the original setup is 
so time-consuming, the teacher will be tempted to use 
same questions and the same answer sheets term after term. 
Actually this is not undesirable, providing the defective or 
outmoded items are constantly weeded out. This can c 
simply by telling the pupils that question 23, for examp c, as 
been eliminated. “A substitute question 23 is written on 
board [or on a mimeographed separate sheet]. Answer 
question now — immediately — so that you will not or E 
answer the question that is on the regular test sheets, 
be convenient if the question is so arranged that * 1C . 
h the same as the one originally designated for t e 
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Such an answer sheet can be scored by writing the correct 
responses on strips of stiff paper or cardboard and laying the 
appropriate strip alongside the column of answers. This proc- 
ess involves the changing of strips or the shuffling of papers 
as each column is scored. However, scoring can be completed 
in one operation by cutting slots out of a piece of cardboard 
and writing beside each slot the answers for one of the col- 
umns, as in the accompanying sketch. The number of the 



question is not indicated, since this would clutter the score 
card. Errors can be avoided by being careful to make the slots 
the exact length of the answer column. 

Perhaps the most rapid hand-scoring method is to provide 
an answer sheet on which the student has to block out the cor- 
rect response, as follows: 

Pupil’s Name piatr 

Subject 


Completely block out with soft lead pencil the number of the re- 
sponse which you select. Indicate only one answer for each item; 
double answers are scored as incorrect. 


1. 123 45 

2. 123 45 

3. 1234 5 

4. 123 45 

5. 123 45 


26. 12345 

27. 1 2 3 4 5 

28. 1 2 3 4 5 

29. 123 45 

30. 12345 


51. 1 23 45 

52. 1 2 3 4 5 

53. 1 2345 

54. 1 2 3 4 5 

55. 12 3 4 5 


76. 1 2 3 4 5 

77. 1 2 3 4 5 

78. 1 2345 

79. 1 2 3 45 

80. 1 23 45 
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view providing it is motivated and the pupil understands the 
material. The inexpensive teacher-made test can provide 
some of this drill in a situation which is enjoyable for the 
pupil if the score is not overemphasized. The test can also 
provide a check on understanding, since the items are spe- 
cifically designed to be discussed in class, whereas the discus- 
sion of items on a standardized test is specifically avoided be- 
cause it would produce “practice effect” or coaching that 
would invalidate the test. 

The teacher-made test is a useful factor in motivation. The 
knowledge gained in preparation for tests has led many pu- 
pils to develop new interests. It can accomplish this for more 
pupils when teachers stop making scores the basis for inter- 
personal comparisons and use the results to show each pupil 
what progress he is making and where he needs special work. 
The teacher should constantly bear in mind, however, that 
this transfer of interest from the test and its results to the sub- 
ject under consideration is not automatic. It will be necessary 
for him to show how the interest should expand, how the 
knowledge can be used more effectively than it is in a pencil- 
and-paper test, and to indicate the personal value of increased 
knowledge. 

Thus standardized tests are of greatest value in estimating 
achievement over a period of time — from the beginning to 
the end of the term — whereas teacher-made tests are of great- 
est aid in facilitating intermediate steps in this long term 

growth. 


SUMMARY 

Standardized tests have inherent limitations. They often d 
n °t fit local situations from the viewpoint of the type of ^ate 
r5 *ls used, curricular emphases, and the ability and back- 
end of the pupil population. Teacher-made tests can be 
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item; in this way the answer sheet or stencil will not have to 
be changed. 

It is recommended that no scoring formulas be used, such 
as the right-minus-wrong (R — W) scoring of true-false an- 
swers to discourage guessing. Actually, differences in children 
are such that some will not be discouraged from guessing and 
others will not put down answers they are not sure of. Per- 
haps it is better to encourage intelligent, informed guessing 
than to discourage blind guessing. At any rate, the accuracy 
of the teacher-made instrument is not so great that its reliabil- 
ity will be significantly increased by scoring formulas. Fur- 
ther, it is not the score that is significant. Rather the object is 
to discover what areas need particular attention, what is caus- 
ing a pupil’s particular difficulty, and approximately what 
progress each pupil has made. Scoring formulas will not help 
to a significant degree in any of these purposes. 


RELATIONSHIP OF TEACHER-MADE TO 
STANDARDIZED TESTS 

Both teacher-made and standardized tests play important 
roles in the accomplishment of the ultimate purpose of all 
tests to facilitate pupil growth. Standardized tests are prob- 
ably more accurate than most teacher-made tests; they are 
more reliable, objective, and valid; but teacher-made tests 
have the advantage of being more readily adaptable to local 
conditions. They are relatively less expensive and can thus be 
used more frequently as checks on progress, as a means of 
motivation, and in some instances for aiding diagnosis. Since 
both kinds of test have a part to play in the understanding of 
pupils and in the stimulation of their growth, it is obvious that 
the teacher is not limited to the use of either one or the other 
exclusively; the two are supplementary to one another. 

Most modem educators have no objection to drill and re- 
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4. Make your own set of fifteen or twenty multiple-choice items 
to cover the content of a chapter or several chapters of this book 
and give it to others to take and then to criticize. Summarize what 
you learned about multiple-choice questions from this experiment. 

5. Write out the objectives of a class you have taught or are 
preparing to teach and submit the list for criticism. Design the 
objectives in such a way that testing on them is feasible. 

6. Examine the feature “It Pays to Increase Your Word 
Power” in any issue of the Readers’ Digest. Point out the in- 
stances in which the author has attempted to mislead by present- 
ing a “plausible” but incorrect response. How can you use this 
practice profitably in test construction? 

7. Poll a group of persons who are interested in tests and sum- 
marize the techniques they suggest for improving such tests as 
the short test to stimulate interest, the pretest at the beginning o 
a unit of work, etc. 


SUGGESTED ADDITIONAL READINGS 

Greene, Harry A., Albert N. Jorgensen, and J. Raymond Ger- 
berich: Measurement and Evaluation in the Elementary c °°’ 
2d ed., New York: Longmans, Green & Co., Inc., 1952, pp. ltv- 
194. 

This chapter deals with different types of objective 9 ues ^ 
(completion, multiple-choice, matching, etc.) an cites e 
ful suggestions for constructing them. The suggestions or ea 
kind of test item are summarized. . c . » 

Lee, J. Murray, and David Segel: Testing Practices of , 

Teachers, U.S. Department of the Interior Bulletin 9, O 
Education, 1936, 42 pp. , , 

This bulletin reports a survey of testing practices an “ 
their effectiveness as a basis for making suggestions or 1 
ment. The bulletin is designed for administrators an 
looking teachers. . Tndax's 

Ross , C. C. (rev. by J. C. Stanley): Measurement . 

Schools, 3d ed., Englewood Cliffs, N.J.: Prentice- a , 

PP- 139-206. , • 

Part II, consisting of three chapters, deals v-hh “JVV :? |j s G f 

Paring, and evaluating teacher-made tests. Differe 
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used to compensate for some of these limitations because they 
are readily adaptable to local emphases, they can be devised 
to fit small subdivisions of a subject area, and they can and 
do serve as sources of motivation. When carefully made and 
used with due caution, the teacher-made test is effective for 
diagnosis. 

Teacher-made tests should meet the criteria of good stand- 
ardized tests. Objectivity can be increased by using short- 
answer items broad enough in range to reveal ability and lim- 
ited enough to be easily scored. Validity can be increased by 
clearly formulating the aims of the unit or area on which the 
test is based. Reliability can be increased by continuous study 
of the teacher-made test through periodic analysis of pupils’ 
answers. Ease of scoring should be kept in mind in construct- 
ing the test, and special answer sheets and scoring stencils 
will increase economy. The authors feel that the over-all ad- 
vantages of matching and multiple-choice answers are such as 
to warrant preferring them over completion and true-false 
items in short-answer tests and over the essay type of test. 

Teacher-made tests are not intended as substitutes for 
standardized tests. Both types play valid roles in education, 
and both should be regarded as aids to instruction and as 
supplementary to each other, not as ends of education. 

STUDY AND DISCUSSION EXERCISES 

1. What advantages of teacher-made tests over standardized 
tests have you found, through your reading or experience, other 
than those listed in this chapter? 

2. Under what conditions is it permissible for the teacher to 
use subjective data in the evaluation of his pupils? 

3. Recall some of the experiences you had as a student taking 
true-false examinations. Would your experiences accord with the 
observations made in this chapter about the use of this type of 
question? 



CHAPTER THIRTEEN 


Improving Appraisal Practices 


Pupil development is a many-faceted phenomenon. Aspects o 
physical, mental, emotional, social, and academic growth arc 
present in problems of measurement and evaluation. Many 
factors are at work to produce growth in any one of these 
areas or to produce interrelated (organismic) growth in all the 
areas. Among these factors the following are outstanding: 
hereditary potential, health, sensory equipment, home condi- 
tions, family relationships, community mores, political phi os 
ophy, curricular demands, educational philosophy, an t 
child’s reactions to all these. Many techniques for measuring 
these varied facets of the total personality have been esen e 
in the foregoing chapters, and problems involved in mea 
roent have been discussed. In view of the multifacetc na 
ture of growth, it seems absurd to attempt to eva “ a 
with a single number, letter, or word. Yet the fact is a 
attempt is made in what is called the “grading or ma 


system. 

The authors claim no originality in condemning t e P 
lice of attempting to summarize the many factors o £ 
with a simple number, letter, or word. Rather, our re 
reflect investigations by experts, critical examination 
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test items and the special problems each presents are dealt with, 
and suggestions are offered. 

Torgerson, Theodore L., and Georgia Sachs Adams: Measurement 
and Evaluation, New York: The Dryden Press, Inc., 1954, pp. 
220-243. 

The uses and characteristics of good teacher-made tests are de- 
scribed. Suggestions are given for making essay, completion, 
true-false, multiple-choice, and matching questions. A check 
list is provided for evaluating teacher-made tests. 

Travers, Robert M. W.: How to Make Achievement Tests, New 
York: The Odyssey Press, Inc., 1950, 180 pp. 

This short book is full of practical suggestions for planning and 
constructing teacher-made tests. It covers all subjects, but sci- 
ence teachers will find the explanations and examples especially 
helpful. 
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fuses the symbols of learning with the products of education. 
Learning is a sufficient reward for the first grader, but the 
symbol has become more important to the sophisticate. 

2. Marks tend to emphasize subject matter. But among 
aims of elementary education are understanding and practice 
of cooperative social functioning; opportunity to exercise 
habits of reflective thinking; exercise of individual capacities; 
command of the fundamental processes of reading, writing, 
and arithmetic (communication); and gaining and keeping 
good physical and mental health. Marks tend to emphasize 
only the aim relating to academic accomplishment, which, ad 
mittedly, is very important; but this academic knowledge an 
skill is simply a tool for helping one to achieve the other aims 
on the list. Concern for marks sometimes results in an em 
phasis upon subject matter which may actually limit the pos 
sibility of the pupil’s attaining the other goals. 

3. Marks tend to discourage good teaching. At the ns 
oversimplifying, we may define teaching as guiding or e 
couraging each child to come progressively closer to rea izing 
his own potentialities in all aspects of growth. Thus tcac g 
involves an intimate knowledge of the children one is teac 
ing, the development of personal ambition, social onen > 
originality (or, at least, uniqueness), and inoral an c > 
values. Teachers who use marks may keep tlics ] 
•ives in mind; but some teachers employ marks as a t rea 
when their teaching methods fail. If the child does "° s . 
value of assigned tasks or if he is worried about ou ° 
situations, he can still be made to conform by t c L 
a low mark or failure. Problems of getting to now 
encouraging growth, and promoting sclf-rea lzat '° ’ as a 
not necessarily be considered when one can use g 

cudgel. 

4 - Marks tend to cause teachers to .overlook ^ 

Ev «y teacher is aware of the great individual di e v 
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tice, and the reported experience of many teachers. Practices 
that hold a great deal of promise for the more effective stimu- 
lation of symmetrical pupil growth are already in operation. 
However, we should not be so sanguine as to believe that the 
answer to the question, “What are the best evaluation tech- 
niques?” has been given. As more teachers depart from tradi- 
tional marking practices, better answers to this question will 
be given. Meanwhile, the departures that have been made will 
give teachers some idea of techniques which have been 
gratifying in bringing educational practice into closer accord 
with accepted child-growth theory. 


SOME SHORTCOMINGS OF GRADES 


The purpose of marks and appraisal is, theoretically, to 
foster pupil growth. They purport to tell something about the 
pupil that will make it easier for him and those who work 
with him to guide his future development. Although this is the 
theory behind grading, there are many practical reasons for 
doubting that it accomplishes this worthy aim. 

1. Marks tend to become the end and aim of education. 
William H. Burton’s statement represents a consensus of pub- 
lic school workers when he asserts that a misconception of 
education is that the symbols of education are equivalent to 
the outcomes of learning. 1 If you ask a first grader what he 
got out of school on a particular day, he will say, “I learned 
to read a story,” “I learned to spell my name,” or “I learned 
to print my name.” If this same question is asked of a sixth 
grader he is likely to say, “I got an 80,” or “I got a B.” If the 
college student in general psychology is asked what he got 
from the course, he will probably indicate that he, too, con- 


•WimamH. Burton, The Guidance of Learning Activities, New 
York: Appleton-Century-Crofts, Inc., 1944, pp, 52-59. 
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to believe that they indicate to the pupil and his parents how 
the pupil is getting along. The truth is that, as parents, teach- 
ers, and pupils we have become accustomed to a symbol that 
has little meaning, as with men’s wearing vests. Study after 
study has shown that the same paper will be graded by differ- 
ent teachers with a different value. Some teachers give a large 
proportion of A’s, and others say, “No student can be perfect, 
and A means perfect.” Some teachers have in mind academic 
accomplishments alone when they give a mark; others try' to 
include such factors as industry, interest, sincerity, and orig- 
inality. Many parents really have no idea how well their child 
is doing in relation to his ability or in relation to other chil- 
dren, but they are pacified by some meaningless jargon ex- 
pressed as a grade. Parents who become accustomed to an im- 
proved form of evaluation assert that they do not see how they 
could have been satisfied with the old working system. 

Contrasts between Marks and Appraisal 

The seven items listed above are enough to indicate the 
reason for the present trend away from grades to more in 
formative methods of evaluation. In fact, because of these 
characteristics of grades, w f e might even distinguish grading 
from genuine evaluation, as the following contrasts in concept 
indicate. 

Both grades and evaluation are supposedly means of com 
munication between teacher and pupil and between teacher 
a nd parents. But grades are likely to be communication to, 
whereas other forms of evaluation are often communication 
With, 

Grades are ordinarily assigned on an absolute scale which 
phc « a high value on interpersonal competition and rivalry* 
Truc evaluation, on the other hand, stresses competition with 
°"**lf and places a premium upon the ability of the person 
t0 cooperate in work and play with others. 
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tween pupils; yet, all too frequently, the necessity for grading 
causes them to try to bring the slow growers academically “up 
to average.” We know the futility of this attempt, yet it per- 
sists almost as a compulsion because of the pressure of grades. 
On the other hand, grades encourage mediocrity in brighter 
children because they can get satisfactory marks with the ex- 
penditure of little or no effort. 

5. Marks create a situation that is “unlike life.” It is fre- 
quently argued that grades are a lifelike phenomenon — 
that we are all graded in our commercial, industrial, and pro- 
fessional careers. We are, to some extent, graded in our voca- 
tional lives; but with definite differences. Few of us would 
freely continue to teach, to sell, to run a machine, or to build 
a home if a “big boss” looked over our shoulders each week 
and marked our cards with an A, B, C, or D. We do not need 
such prodding because each of us seeks the inner satisfaction 
of doing a job well. In fact, a great deal of the enjoyment we 
derive from work would be destroyed by a marking system 
patterned after school report cards. Another vast difference 
between school and life is that interpersonal comparisons in 
life are made between people in the same occupation. Typi- 
cally, school marks are based upon the erroneous assump- 
tion that all children are the same— that they should run the 
same course and finish at the same time. 

6. Grades tend to penalize those pupils most in need of 
help. It has been said, with some truth, that a child is most in 
need of love when he is most unlovable. We might say that 
when the child is most in need of encouragement (because a 
task is difficult for him) he is most likely to be discouraged 
by the awarding of a low grade. Frequently, it is the child 
who is working up to (sometimes apparently exceeding) his 
indicated capacity but who is below “grade level” who is fur- 
ther discouraged by a low mark. 

7. Marks have little meaning in themselves. It is a delusion 
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cause of the dangers which seem to be inevitably attached to 
it. 5 

PROMISING PRACTICES IN EVALUATION 


The “promising practices” which follow are not listed in 
order of merit, for their applicability varies in relation to such 
factors as the community, the competence of the professiona 
staff, and the intelligence and grade level of the pupils. 

1. Letters to parents. In place of the report card with boxes 
containing marks after “reading,” “arithmetic, deportment, 
and the like, some teachers are writing letters to the parents to 
facilitate communication between home, school, and c i 
Sometimes these letters are completely informal, indicating 
only those factors that seem to be most distinctive concerning 
the particular child. Sometimes the letters are accompame 
by an outline which includes such items as intellectual growtn, 
emotional control, social development, unique wea 'ness ; , 
and outstanding gifts and qualities. These letters need no 
sent on definite dates. A guiding schedule may be w or e 
so that three or four letters are sent each week, but 
event of need a letter may be sent well ahead of sc e u e. 
the course of a year, three letters or half-a-dozen letters 
be sent regarding one child, whereas one or two wi 
for another. In fact, one of the specific merits of this p an 


its flexibility. . . , 

2. Home visits. An excellent means of communica 1 
tween teachers and parents is for the teacher to ma ... 
the child’s home. There are, however, some hazar s 

L. , ...... , Some persons who 


the child’s home. There are, however, some hazar s ' v 1 
must be evaluated if the plan is to succeed. Some pers 
hve in homes which they wish were considera y 
embarrassed by the teacher’s visit. Hence, home visi s 
be approved by the parent before the teacher ca s. 

’ Th b is the author’s opinion, but it is ’'“^ “"^“"omention may 
* nd reflection. It is our hope that discussion of the 
^ to some fruitful conclusions. 
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Subject-matter mastery is the primary emphasis of grades. 
Again, it must be admitted that such an emphasis is worth- 
while, but not to the apparent exclusion of other values. 
Evaluation places subject-matter achievement in the context 
of pupil development. 

Grades are typically given at the end of a period of work 
— at the conclusion of a unit of time, and as such have little 
value in diagnostic and remedial procedures. Evaluation is 
specifically designed to capitalize upon strengths and to rem- 
edy weaknesses. The purpose of grades is to judge the person 
and his work, whereas the purpose of evaluation is to guide 
the person and his work. 

Grades often become the end and aim of learning activ- 
ities, whereas evaluation points the way to more productive 
living and learning. Grades are, at least in part, a concom- 
itant of the policy of blocking out subject matter in pre- 
scribed units, books, and courses. Evaluation is a personal 
matter, and its philosophy implies the use of subject matter' 
to achieve the social ends which seem most appropriate to 
the individual. 

These contrasts are, of necessity, generalizations that will 
not always hold true for specific cases. Some teachers may use 
grades in such a way as to approach the values indicated for 
evaluation; and the various means of evaluation may be used 
in such a way that they are no more meaningful than grades. 
This lack of sharp contrast or distinct differentiation between 
grades and evaluation leads us to recommend that systems of 
evaluation be introduced gradually in the school — by taking, 
for example, the first three grades for a “test run,” by confer- 
ring with a few parents at the beginning, and by frankly ad- 
mitting that the new system is experimental. But there should 
be no attempt to reconstruct, modify, or alter the traditional 
grading system because modifications can too easily return to 
the former inadequacies. The system should be abandoned be- 
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should be made only if it is absolutely necessary. The teacher 
should not attempt to solve problems on the first visit but 
rather to open the way to further cooperative study of the pupil. 

3. Teacher-parent conferences in the school. Conferences 
at school have essentially the same purposes as home visits 
The particular advantage of the school visit is that the paren 
can be shown some of the child’s work and the objectives 
school activities can be more clearly explained. Test ata m y 
be examined and interpreted with greater exactness w en 1 
deemed advisable to reveal the information. Some P 
may prefer visiting the school to having their living con i 
or their immediate neighborhood revealed. School visi 
require so much of the teacher’s time, but they s ou 
theless be a scheduled activity and a responsibi lty o 
ministration as well as the teacher. f ,.i 

Many of the suggestions for making home visits * 
apply also to teacher-parent conferences. A lt,M ' ... con . 
tions are as follows: Do as much listening as poss 
sistent with not permitting the time to drag, am a 
expression of cheer and confidence; this may seem ^ ^ ^ 
ficial suggestion, but it is fundamental to - con . 

conference. Avoid any indication of shock at ou K 

“pts, rough language, or questions re S a * ' ng ™ Qne or tw0 
the number of criticisms to a minimum. much advice, 

constructive suggestions at an interview, too 
even when it is good, is likely to overwhelm « one * ^ 
parents is criticized by the other, do not take si 
sue even if one is clearly wrong. . thin „ some of 

4. Sell-appraisal. Self-appraisal a 1 ‘ (Q0 | en j c nt in 

us are too critical of ourselves, and otners skjn 

scU-evaluations. Nevertheless, the de\c °P m< \ ve 5 j ncc it is a 
is an exceedingly important educational o jee 55 Practice 
m ajor factor in the individual’s vocationa s clf-cvalu- 

m ay be begun in the first grade. The natura 
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ample, the teacher may offer to call at a specific time, but 
“ ... if it is inconvenient, we shall be glad to have you visit 
us at school on [a definite date].” Many parents are more at 
ease in their everyday surroundings than in the relatively 
strange atmosphere of the school. 

A good deal of the objection to home visits comes from 
teachers who are somewhat reluctant to make calls. It may 
not be an easy thing for some teachers at first; but those who 


have become accustomed to it have frequently asserted that 
they could never return to another kind of reporting practice 
because of the new view of the child they got from the home 
visit. One teacher who has used the practice for years in the 
sixth grade remarked, “There are few things about which I 
am reluctant to talk to parents or which I hesitate to say. 
After all this time, I know the parents very well and have 
learned how to approach them. Pupil problems are so much 
easier to handle now.” Another objection to home visits is the 
time required. School administrators are responsible for see- 
ing that time is provided for this worthy enterprise. If the 
teacher does not have time to visit all the homes, he should 
attempt to visit selected homes. However, these should not be 
only the homes of the pupils who are having some kind of 
difficulty; even if only a few visits are possible, both the 
good” and the “bad” should be included. 


There are several ways of ensuring successful home visits, 
he teacher can prepare the parents by telling the child sev- 
eral times what the purpose of the visit is: “We are just going 
to have a talk. I hope to talk with the parents of all the chil- 
dren. A formal letter to all parents will supplement the ex- 
planation given to the child. A definite date and time should 
be set m advance. The visit should not be an inspection tour; 
any suggestion of evaluation of the home will spoil the accord 
which the visit is designed to establish. It is important to find 
something pleasant to say about the pupil; negative statements 
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Self-appraisal obviously cannot replace other forms of 
evaluation. As a supplementary device, however, it is worthy 
of careful trial because it contributes so much to the imple- 
mentation of democratic theory. 

5 . Teacher-pupil conferences. Conferences between pupil 
and teacher are really an aspect of pupil self-evaluation, wit 1 
a difference in emphasis. In pupil self-evaluation the teac 
role of adviser is held to a minimum and the relations ip re 
sembles client-centered counseling. Teacher-pupil eva uati 
is based on the belief that there is value in a give an ax 
relationship. If the teacher has something critical to say, 
will say it — always, of course, with the view in min o p 
moting pupil growth. Teacher-pupil evaluation gives op 
recognition to the responsibility of the teacher or posi 1 


leadership. „ . „i 

In a way, this technique falls short of pupil sc a PP r 
hut it is a long step beyond the teacher s giving a S ra 
brings the pupil more directly into the evaluation pr . 
which is an integral part of the learning process, an 


traditional practices. .... .f 

6 . Teacher-pupil-parent conferences. Out iscus 

parent conferences and pupil self-appraisal have a ” cQn _ 
consideration of this technique. The value of home-sc 
tacts is widely recognized, but quite frequently this re g ^ 
seems to ignore the child, or at least to treat him as ^ _ 

a disinterested part of the entire procedure. in . 

ably times when it is advisable for the chil to e ^ 
formed about some matters bearing on is '' (]y 

Probable, however, that these situations occur 10 ]f 

■ban many anxious teachers and parents * ccin sccms logical 
tcponing is a manner of communication, t en i 
10 admit that the pupil must be intimately in'° ' c that t hc 
The clear advantage of the three-way c ° n jnlVt ,hcrc is 
f'hclihood of misunderstanding is reduced. 
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ation is reflected in such statements by children as “This is 
good” or “This is no good” in referring to blocks they have 
piled up, pictures they have drawn, or games they are playing. 
Unfortunately, this tendency is curbed by grades and marks, 
which make the child dependent upon the teacher’s evaluation 
to the extent that the one criterion of success becomes ac- 
ceptability to the teacher. Teachers can help promote self- 
appraisal by commending effort and its products or by asking 
the pupil if the work could be improved. If the teacher feels 
that the youngster is wrong in his appraisal, he should not try 
to impose an evaluation. Skill in self-appraisal, like other 
human traits, is the result of growth and development. 

Self-appraisal is not an ability that flourishes when exer- 
cised at six-week intervals; it requires daily practice. The 
teacher should record the pupil’s oral efforts at evaluation and 
encourage the pupil to compare his present efforts with his 
past work. Group discussion, even in the primary grades, can 
help children achieve better self-evaluation. Classmates’ praise 
or censure of the child’s conduct and work stimulates him to 
make his own evaluation. As children progress through the 
grades, it is advisable that some of the evaluations be written. 

e child may write a letter to the teacher or parent regard- 
ing his evaluation of his performance in social and academic 
functions in the classroom. 


. H WhCreVer P°f ble chadr *n should evaluate their own work, 
)U ™ m a ac “7 lishment and progress by charts kept by each 
W " COn,e t0 y0U WiUl different ^cities and train- 

mg, the competitive system of marking most usually practiced be- 
comes unfair competition and often leads to dishonest practice. 

7? SUre Where !t is 'cast helpful and very often allows 
the bright child to maintain himself with no exertion.’ 

* Faith Pascal, “When the Child Makpc vr;. ^ „ 

and Report, Bulletin 77, Washington: AssocMon JoT'cMUhood 
Education International, 1942, p. 29. 



7^5 

IMPROVING APPRAISAL PRACTICES 

to the fact that it is informative to know how the child 
achieved his present status. Quite frequently, considerable 
concern about the growth and status of a child would be al 
layed if the teacher could but picture clearly what the chil 
was a few months earlier. When one sees a youngster every 
day, it is easy to overlook the minute increments of growt 
which add up to an encouraging total. The cumulative recor 
can help teachers to see the child’s progress more clearly. 

It would seem desirable to have a nationwide standar 
cumulative-record card or folder, or at least a uniform car 
for use within each state. Such a card would facilitate t e 
derstanding of a pupil as he transfers from one schoo to 
other. Since we do not have standard cards, either nation y 
or within states, it is feasible for each school system to w ’ 
out its own card, thus ensuring the recording of t ose 
which are most important for the particular schoo sy 
The selection of the type of data to be recorded is no 
task. One suggested guiding principle for making a cum 
record is to keep the information to a minimum, or 
of data is discouraging to the teacher. System an re 
in keeping the record will make up for some e C1 a 

the amount of information noted. With these o s ^ a ^ 

starting point, it is recommended that the recor 
following items: , .... fq ,her’s 

Personal Data. Name, sex, birthplace, date o i > ^ 

name, nationality, occupation, mothers name, na 1 
cupation, family status (married, divorced, etc.), si 
and age), and language spoken in the home. ded for 

Chronology and Address. Several lines wi rade> 

Ganges of school and address: date entered, c as arks 

"ame of school, home address, phone, and sigmfican 
about the home. f con f e rences. 

Conference Notes. Notes on the out c 0 ® es ° ; „f the 

w «h the date and grade status of the child at the 
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much room for misunderstanding when percentage grades and 
letter tags are used. It has been shown that there is relatively 
little consistency in the use of grades among teachers, and 
their meaning is likely to vary still more among those who 
deal with them less frequently. But if there is a lack of under- 
standing between persons in a conference, questions will stim- 
ulate an answer that might lead to clarification. 

The observations of both parents and teachers leave little 
doubt that the three-way conference increases understanding 
of the pupil. The parent knows his child better for seeing him 
in action in another part of his environment. The teacher 
knows the pupil better for seeing him in contact with the other 
adults who so greatly influence his life. 

Some of the hazards and shortcomings of this method of 
appraisal are the following: The negative attitude of the 
teacher who feels that it is an imposition on his time to con- 
duct these conferences is a very real obstacle to the success 
of this method. Also, holding conferences may cause teachers 
to neglect other means of evaluation, such as cumulative 
records and personnel cards, because no record is kept of the 
interview, although records are an important responsibility of 
the school. The technique will be increasingly difficult to use 
as the pupils progress through the grades, because departmen- 
ta rzation of instruction puts the pupil into contact with more 
and more teachers who know him less and less intimately. 
This defect is, however, no greater than the hazard of giving 
a child a grade on the basis of superficial knowledge about 
him The teacher-pupil-parent conference is not a panacea 
for the problems of evaluation in education. It is another 
means of communication; but examinations, inventories, pro- 
jective techniques, cumulative records, and staff conferences 
are also a part of the process of evaluation. 

7 Cumulative records. The teacher’s conviction that the 
child should be accepted for what he is should not blind him 
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Experimenting with Evaluation Techniques 

It is all too evident that no single perfect method of evalua- 
tion has yet been devised. It is therefore recommended that 
each school have a committee to deal with evaluation prob 
lems, experimenting with techniques devised by members o 
the staff or adapting techniques in use elsewhere. Evaluation 
so intimately affects the entire operation of the school— 
curriculum, methods, promotion policy, philosophy t at it 
constitutes an effective focal point for critical examination o 
the entire school. 

Many lists describing the effective teacher have been ormu 
lated, but one criterion is always present in some form, 
good teacher is learning, growing, or progressing. Time spe 
on local problems of evaluation will be fruitful from t e s an 
point of teacher growth. Incidentally, it should be menti 
that the success of any of the techniques mentione 1 
foregoing section (letters, conferences, cumulative rec ° r 
student self-appraisal) will depend first of all upon t c tcac 
acceptance of the idea. Improvements in the techniques 
also depend upon the teacher’s acceptance and understan 
William L. Wrinkle, after a careful study of many prac 
in evaluation, concludes:' 

Perhaps no final bit of advice would be more a PP rop " at ° 0 j e . 
tl,an ... the following statement made by Fran ’ in 
'■elt in his 1932 Baltimore address: “Do something; an ^ 
have done something, if it works, do it some more, an i nvo lvcd 
w °rk, do something else.” There is one very happy asp ^ 

,n ^tempting to bring about improvement in mar ing being 

Practices-— whatever you may do has little °“ 

raorc objectionable or less adequate than the practice P 

‘William L. Wrinkle, Improving Marking and Reporting F 

‘ ** York: Rinehart & Company, Inc., 1947, p. 



236 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 
meeting, should be a part of the record. Care must be taken 
to avoid the temptation to record anything that is unneces- 
sarily derogatory about the home, the child, or the parents; 
for example, “We have asked the Kiwanis to give Don a pair 
of shoes” is preferable to “Mr. B. has made no effort to see 
that Don is properly shod.” Or, “Whenever possible, we should 
keep Don after school and let him work at the projects he 
likes so well” is better than “Don’s choice of after-school com- 
panions is consistently bad 

Record of Attendance. This should include terms, dates, 
punctuality, and school progress. 

Achievement-test Data. These must be complete to be 
meaningful. They might include date, grade, name of test, 
subject, form, and grade placement or other standard results 
such as percentile rank and standard score. 

Intelligence-test Data. Date, grade, name of test, form, 
chronological age, MA, IQ, examiner (if an individual test), 
and other standard results should be included. 

Significant Behavior or Personality Observations. Again, 
care must be taken to avoid unnecessarily derogatory remarks. 
The purpose of the whole field of measurement and evalua- 
tion is to help the child. It is doubtful if a recording of nega- 
tive data that might prejudice the next' observer will serve this 
purpose. 

Anecdotal Reports. These might be subsumed under the 
previous heading, but since they may involve considerable 
space a separate heading might well be considered. 

In some cumulative records considerable space is devoted 
to a description of the use of the various blanks. These ex- 
planations increase the usefulness of the record by providing 
a common basis for posting and interpreting. Holding a staff 
meeting at least once a year on the meaning and purpose of 
the record is a worthwhile practice. 
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being developed. The application of these techniques has led 
to the formulation of the following basic principles of evalua- 
tion: 

1. Pupil evaluation is a means of communication between 

the school, the child, and the home. As such, it must 
meaningful to all concerned. . 

2. The purpose of evaluation is to promote op 
growth. An indication of status is not enough. 

3. Evaluation should indicate what steps s ou 

next. A statement of desired behavior is an inherent 
sponsibility of all the evaluators. 

4. Appraisal should be in terms of individua acc p 
ment and not in terms of interpersonal c0 "' p “" . of 

5. Evaluation should be in terms of the stated obj demands 

education for the school level concern . cm _ 

of such groups as college professors, regis r ’ . 

ployers should not shape the entire evaluation practice 

at any level. ». , not an 

6- Evaluation should be a continuing proce. 

end in itself at any point in the pupil s g row ' always 

7. Objective data are necessary, but these data are ahvay 

relative to the living, dynamic person (h(j cntirc 

8. Alterations of evaluation procedure matter 

philosophy of the school and must lace only as 
for serious study. Change shoul value, 

rapidly as those concerned arc convince 


SUMMARY 


bum * 

The object of appraisal practices is com ™ 1 '"^ Bccau sc ap- 
eacher and pupil or between teacher an - s difficult, 

ttaisal relates to all phases of a pupil s gr°" ,' m But bc- 
* not impossible, to find a common caon ’, -j g f0 \vtli. 
“tusc appraisal is so intimately conncctc wi 
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If we reflect upon the purposes of the school and upon the 
purposes of evaluation, we immediately remember that edu- 
cation is a common enterprise involving the home, the com- 
munity, the child, the teacher, and the administrator. In con- 
sidering ways to improve evaluation procedures, it is heartily 
recommended that a committee or informal group be called 
together to discuss some of the problems. This group should 
consist of some parents who have evidenced interest in the 
school, some citizens who are willing to devote some of their 
time to the problem, a student or two who have the ability to 
speak with clarity (they need not necessarily be the brightest 
in the class) , some teachers who can resist the temptation to 
dominate a discussion of educational problems, and an ad- 
ministrator. This group can consider the purposes, methods, 
and tools of evaluation. If changes in the system are war- 
ranted, it will be helpful to have a group of parents, citizens, 
and children who will serve as a vanguard in the job of in- 
terpreting the changes to the community. 

The technique of involving parents, citizens, and pupils in 
a consideration of school problems has been tried in many 
localities, and the consensus of school workers who have 
evaluated such groups is that they are indispensable. Some- 
times good practices have failed because of inadequate in- 
terpretation to the public. When a committee is called to- 
gether, the urgency of this phase of forward movement is made 
so apparent that it cannot be overlooked. 

ELEMENTS OF GOOD APPRAISAL PRACTICE 

It has been shown that present methods of appraising pupil 
development and progress in the school are open to serious 
criticisms, and as yet, no universally acceptable substitute for 
the questionable practices has been devised. Marked improve- 
ments are possible, however, by means of techniques now 
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parent conferences in the’school have such inherent advantages 
as economy of the teacher’s time, availability of cumulative 
data, and introduction of parents to the materials and meth 
ods with which the pupil and the teacher are familiar. Pupil 
self-appraisal should be an objective of all evaluation, an 
pupils must be given specific opportunity to practice self ap 
praisal. Teacher-parent-pupil conferences combine many o 
the advantages listed above and have the additional advantage 
of consciously bringing the pupil onto the scene. These various 
means of communication do not erase the need for cumu ative 
records, which permit communication between various per 
sons related to the pupil at successive periods of time. 

No universally acceptable appraisal practice has yet 
devised. Each local school system must plan its own mo 
effective evaluation procedures. This is arduous wor in 
ing the coordinated efforts of teachers, pupils, parents, 
zens, and administrators. The effort will be fruitfu , o\ 
not only because appraisal facilitates pupil growl 
cause the improved communication will result in ett g 
eral educational practice. 


STUDY AND DISCUSSION EXERCISES 


HUDY AND UlSCUWiwn 
1- Teachers often say, “We’d like to change, but s tu- 

wiu not let ns.” Evaluate this statement by haV ' n ® heR and , en 


dents who are studying this book interview w ” ,. . 0 f 
p^nts. Contrast the degree of acceptance or repud, afon 
change for each group. , nf crades 

2. Do you agree with all the so-called shortco objec- 

lh « are listed in the chapter? Can you think of any other 
bans or advantages that have not been mentione ■ di«atis- 

3- Has your own experience been one of satisfaction or ■ 
action with grades? . draw up 

■ Divide the class into groups and have ea jmprov- 

. lst of ways of implementing each of the s ^SS cs cr jiicism and 
' f ng appraisal practices. Bring the list to c ass 
tether suggestions. 
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the study of evaluation practices stimulates examination of 
education as a whole — its philosophy, methods, curricula, and 
materials. 

Grades and marks are open to criticism because of such 
factors as the following: Marks tend to become the purpose 
of education for the pupil; he works for the grade. Marks tend 
to stress subject matter as a primary aim, whereas pupil de- 
velopment or pupil self-realization is the transcending aim of 
education. Marks tend to become a cudgel for the inept 
teacher, who uses them for incentive instead of establishing 
more persistent motives. Grades tend to force all youngsters 
to progress at the same speed and to conform to one mold. 
They are a threat to the maximum enjoyment of school, since 
slow pupils persistently tend to get low marks and particularly 
able pupils are likely to get good marks without learning the 
valuable habit of rigorous application. Actually, marks have 
little meaning because of the different values teachers assign 
to them and because those who look at the marks “read” 
them differently. 

There are a number of contrasts between grading and other 
appraisal practices — contrasts that indicate the tendencies of 
the two practices. Some of these contrasts are communication 
with versus communication to; competition with self versus 
competition with others; subject-matter emphasis versus em- 
phasis on pupil development; guidance versus judgment; and 
emphasis on the products of learning versus the symbols of 
learning. 

A number of encouraging evaluation practices are now in 
operation. Although no one of these is perfect by itself, vari- 
ous combinations of the following methods will be improve- 
ments over “grades-and-marks” practices. Letters home, either 
in oiroa or following a definite outline, make for increased 
c arity of communication. Home visits are valuable when both 
teachers and parents endorse this kind of contact. Teacher- 



CHAPTER FOURTEEN 


Toward a Planned Program of 
Evaluation 


Each chapter of this book has been devoted to a phase of the 
total testing program. The view presented in this book is t a 
tests are samples of behavior which help the teacher to get a 
better view of the pupil in order to facilitate his future ev 
opment. Both the uses and limitations of tests have been - 
scribed in order that teachers may capitalize fully u P on 
values each test possesses. The specific problems invo 
testing ability, estimating achievement, appraising persona ’ 
and evaluating classroom status have been discusse ' v ‘ 
view to helping the teacher see how each device may e 
,0 c °ntribute to pupil development. . 

It has been necessary in this book to expan in a 
ters concepts that were mentioned in earlier ones, y 
«dure the concepts and skills needed in cfTecUvc'y m.Iiz g 
fists have been presented by means of a spiral c ' c a 

ln ‘bis final chapter these concepts will be exemp i > 
Proposed testing program. This suggested P ro £ ran ’ ' 

V|dc a point of departure for those who would i P c 
ice regarding the development of their program. 
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5. Compare cumulative record cards or folders from several 
schools as to merits and shortcomings. 

6. Consult some recent educational periodicals to see if there 
are any recent reports on the value of newer appraisal practices. 

SUGGESTED ADDITIONAL READINGS 

Association for Childhood Education International: Records and 
Reports , Bulletin 77, Washington: The Association, 1942, 32 

pp. 

Different phases of the problem of evaluation and reporting 
are discussed in this pamphlet by school workers with practical 
experience. Various views of pupils, parents, and teachers are 
represented. 

Elsbree, Willard S.: Pupil Progress in the Elementary School, 
New York: Teachers College, Columbia University, Bureau of 
Publications, 1943, 86 pp. 

The last two chapters in this booklet deal in scholarly detail 
with contemporary trends in the marking system and reporting 
to parents. The author’s list of trends in reporting indicates some 
of the things one needs to include in his thinking about 
evaluation. 

Smith, Eugene R., Ralph W. Tyler, et al.: Appraising and Record- 
ing Student Progress, New York: Harper & Brothers, 1942, 
550 pp. 

This js volume III of the series “Adventure in American Educa- 
tion,” which deals with the widely known “Eight-year Study” 
or “Thirty-school Experiment.” This book describes how eval- 
uation was carried on in such areas as thinking, appreciation, 
personal and social adjustment, and interests. 

Wrinkle, William L.: Improving Marking and Reporting Practices 
in Elementary and Secondary Schools, New York: Rinehart & 
Company, Inc., 1947, 120 pp. 

This book is based on ten years of experimenting with better 
evaluation practices. The interdependence of appraisal, report- 
ing, and educational practice and theory is recognized, and 
specific suggestions are made for tentative departures from 
traditional practice, but the author makes no pretense of having 
discovered a panacea. 
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toward a planned program or EVALUATION 
that some educational values will result, but upon the opera- 
tion of the program. Mow well the program operates will dc- 
prnd upon (3) the choice of the most appropriate tests (see 
Chapter 3) and upon (4) a workable system for maintaining 
the results — a simple cumulative record. Unless the entire staff 
has participated in the formulation of objectives and plans for 
operating the program, it is desirable that (5) some training 
sessions be devoted to the administration, scoring, and in- 
terpretation of results. 


Reading Readiness 

The minimal testing program suggested here has as its ob 
jectrve helping the teacher understand the pupils rather t 
revealing to the administrator the status of the pupils or he p 
' m S ’he supervisor evaluate instruction. 

In the first grade, the information most vital to the tea 
h Aether or not the pupil is adequately prepared to begin 
his study of reading. The best test for this purpose is one de- 
to estimate readiness rather than to measure mteut- 
gence - There arc several reasons for this. Group inte *S e 
tel s at best yield only approximations of intelligence, 

!!*■> given in the lower grades are even less dependable, 
nst-grade pupils arc too small to grasp the unpo 
t ' heir task and too lively for prolonged periods of «W* 
l0l >, and successive scores on individual tests s ow „ 
of intelligence test is more dependable or o 
? Preschool and primary pupils- Hence - 
l t0 delay giving group intelligence tests u ^ 
a S accustomed to his new environmen ^ tl]e 

m °re accurate account of himself. Further, i un( j er . 

'tanfth r S ° me harm ’ “ ViCW ° f tbS cumulative 

the meaning of test scores, to record 
• rd a score that might later cause some teac 
ge hU ability. 
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This program is regarded as a minimum; without these 
basic tests effective instruction will be unnecessarily difficult 
Although it is possible to do too much testing, it is advisable 
to use more tests than are included in this suggested program. 
Test results, as we have seen, are to be regarded as supplemen- 
tary and corroborative data which are more likely to serve 
their purpose when they are correlated with information from 
other sources: teacher observation, cumulative records, and 
past school performance. 

Determining the Objectives 

In Chapter 3, “Choosing the Right Test,” it was indicated 
that tests can be used for a variety of purposes. The superin- 
tendent or principal may wish to know more about the ap- 
proximate level of ability of pupils in the school system and 
the extent to which pupils are capitalizing on that ability in 
terms of academic achievement. The supervisor may wish to 
use standardized tests to evaluate the effectiveness of instruc- 
tion in the area of his jurisdiction. A one-session test will yield 
only a minimum of information; better results will be obtained 
when successive tests are used to give data regarding develop- 
mental trends. Thus it is clear that planning is necessary in 
even a simple situation. 

We have seen that test data can be used to make instruc- 
tion more effective. In order that these objectives be clearly 
stated and adequately understood, it is desirable that teachers 
participate in the meetings in which the objectives are de- 
termined. In larger school systems it may be impossible for 
all teachers to be active in the statement of objectives; in such 
situations bulletins and meetings can be devoted to interpreta- 
tion and clarification. 

Achievement of the objectives of the testing program will 
depend not only upon ( 1 ) the teachers’ understanding the ob- 
jectives, uses, and limitations of tests and (2) their conviction 
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gence-test results at this stage of development; hence, in addi- 
tion to the periodic testing of children by audiomctrists, physi- 
cians, ophthalmologists, and school nurses, the teacher must 
be persistently alert to the symptoms of visual difficulty, au i 
toty difficulty, and acute and chronic infection. Representa 
tive of the telltale symptoms for visual difficulty are squint 
ing, excessive blinking, twisting the head when looking at t e 
chalk board, watering of the eyes, sties and granulated eyelids, 
attempts to brush material off the printed page, and ben mg 
abnormally close to a book. Common symptoms of au itory 
difficulty are turning the side of the head toward the source 
of sound, inattentiveness, boredom, cupping the han e in 
the ear, ignoring simple requests and questions, comp am 
of buzzing in the ear or of earaches, speech defects an o 
voice quality, and sometimes seclusiveness and poor sc 
work. Indications of acute or chronic infections may me 
many of the above symptoms, such as listlessness an in 
tiveness, as well as frequent absences from school, dro'vsi 
lack of interest in play and schoolwork, and irrita i ity- 
These symptoms, like the scores on standardize es , 
be taken as informative data to be supplemente y 
roborative evidence. The teacher does not diagnose o 
basis of symptoms, but his awareness of them wi 
earlier and more frequent referral of pupils who mig 


from 


special medical attention. 


Diagnostic Reading Tests 

the second and third grades, the major not 

of the pupil and teacher is reading. Some chi r ? ^ 
^t have accomplished the development that wo 
a general readiness for reading. Tests may s ow revea i 
are Psychologically ready, but actual performance j n 

nchievement which falls short of their 
Order t 0 save pU pji s f ro m the trauma o r 


In the s 
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Although the evaluation of general mental ability may not 
be the factor of prime importance in the primary grades, it 
becomes of greater interest in the intermediate grades. Hence 
it is recommended that the testing program include a series of 
group intelligence tests, beginning in the third grade, ome 
schools a dminis ter these tests in the third, fifth, and seventh 
grades. However, since group intelligence tests are o c l ue; ’ 
tionable accuracy and the rate of mental development is st 
variable in the elementary school years, it seems desira e 
to test in each grade if it is financially feasible. 

The tests should be given during the fifth or sixth wee v 
school rather than immediately. The pupils should be a owe ^ 
time to settle down after the vigorous activities of their vaca 
tion, and new pupils should be given time to acquaint t 
selves with their new human and physical surroun 
(Pupils who enter during the school year should be S ivcn 
intelligence test after they have had time to b '' c0 ™^, 
quainted.) The middle of the week will probably be 
time, but the test period should not coincide wit 
tival, school party, or athletic contest. 

As we saw in Chapter 1, the teacher should respec 
skepticism if test results do not accord with his o serv 
the pupil. In the event that a score seems too low or P 
ular pupil, he should give an equivalent form of “** ® ’ he 

Psychometrist is available, an individual test, 
score has surprised the teacher or not, it will e '' c 
Pure the pupil’s present score with records from his P 


school 


years. 


General-achievement Tests 

% the time the child has reached the fourth ^ 

st >ould be beginning to acquire the informatio 

will ne-a b ........ :„,mt should begin to 

^om i 


. lUld be beginning to acquire the mio 
Wll > need to live effectively. His interest shouId ® matter 
f J? tMdin g, writing, and computation to s(udics In 

ds as geography, language, spelling, 
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continued failure, it will be well to discover what their specific 
difficulties are. Early correction of remediable difficulties will 
prevent the development of the “reading block” so frequently 
referred to. Blocks against reading are, for the most part, dis- 
like generated by chronic failure or a strong conviction that 
one just cannot learn to read. 

Diagnostic tests will help the teacher to determine whether 
the pupil is having difficulty in one or more of the following 
areas of specific reading factors : 1 recognition of visual like- 
nesses and differences in printed phrases, ability to analyze 
words, recognition and understanding of spoken words, ade- 
quacy of reading vocabulary, interpretation of the message 
contained in sentences and paragraphs, understanding of fac- 
tual data, method for attacking new words, and skill in the 
use of tables of contents (a minor concern in the primary 
grades but of increasing importance at the upper grade levels). 

The diagnostic reading test may be given during the first 
two or three weeks of school and used as a source of informa- 
tion for work with children as individuals and in groups. AH 
pupils may profit from wise use of the results of the diagnostic 
test. The information may suggest ways to help the able stu- 
dent make even better progress and thus provide motivation 
for continued development. 

Group Intelligence Tests 

As the teacher diagnoses reading difficulty, he thinks im- 
mediately of the mental ability of the pupil. It is possible that 
mental testing should take precedence over diagnostic testing 
in the second and third grades. If both cannot be done, it will 
be up to the teacher or the testing committee to decide which 
will be of greater immediate value. 

‘Teachers selecting tests might keep these points in mind as they 
read the publishers’ manuals, catalogues, and reviews of diagnostic 
reading tests. 
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of some of the things they may have forgotten during vacation, 
and there are few holidays to interfere with characteristic emo- 
tional stability. If the tests are given at this time, the teacher 
has had a chance to evaluate the pupils but still has enough 
of the school year remaining to benefit from the guidance o 
the test data. It would be highly desirable if an equivaen 
form of the test could be given in the spring to provide a asis 
for the student’s evaluating his own progress and to give 
teacher a chance to judge the effectiveness of his teac ing. 


Upper-grade Reading Tests 

Most teachers appreciate the fallacy of the saying, P 
tice makes perfect.” It is much closer to the truth to say ‘ 
one learns by doing. The quality of one’s reading P r ° 
tends to improve if he does a great deal of reading, o > 
experimental investigation of reading also revea s t a 
one reads extensively he may simply fix more firm y 
he has already developed. Maximum improvement wi 
with directed, correct, and purposeful practice. * , 

tance of continued reading instruction in the interme . ^ 

upper grades is emphasized by the fact that at . 

about twelve or thirteen, interest in reading reac es i 
point. Furthermore, it is at this age that interests shift W 
the juvenile to material which is of interest to a ^ s |.j|l s 
from educational psychology indicate that the e : ^ , hat 

2re taught at this crucial period, the greater 
Werest in reading will continue at a high leveh^ 

Time should be regularly scheduled fo .. , th( . 

of silent-rr " - - * —linn test will indicate m 


: should be regularly scheuu.eu ^ jndicate ,he 
t-reading skills. A silent-reading orovide a 

nteas that are in need of particular attention w ],icft is 

s(r °ng source of motivation for steady app ,c v j,j c )i a 

Perhaps still = — . 'tome of the factors 

silcnt-n 


nps still more important. Some oi in'- p ara . 

t-rcading test might evaluate are compre ic i(Jcas — 

Emph meaning; appreciation of the organiza 
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the intermediate grades, interest shifts from the acquiring of 
the tools of learning to practice in their use. However, this 
does not imply that all pupils have learned the skills so thor- 
oughly that the “fundamental processes” can now be neglected. 
As we shall see in the next section, there should be continued 
emphasis on improvement of the skills throughout the ele- 
mentary and secondary school years. 

Effective use of achievement tests by teachers requires that 
data be available regarding pupils’ ability. Intelligence tests 
give an indication of the pupils’ present intellectual status, 
whereas achievement tests give evidence of how effectively 
the pupils are using their ability. However, as was indicated 
in Chapter 6, “Evaluating Pupil Achievement,” high ability 
does not mean that the pupil should necessarily achieve at a 
high level; health, home factors, personality and social prob- 
lems, past experiences, and the number of current out-of- 
school activities must be considered in interpreting the data. 

In addition to indicating to the teacher whether the pupil’s 
achievement corresponds to his indicated capacity, achieve- 
ment tests help to evaluate the effectiveness of instruction. If 
the average ability of all pupils is near the national norm and 
achievement in language and reading is also close to the 
national norm while achievement in arithmetic and spelling is 
below average, it is possible that techniques of instruction in 
these two subjects should be examined. Individual teachers 
may find areas which they think, in terms of class averages, 
will need particular emphasis throughout the year. However, 
a class average above the norm in language does not neces- 
sarily indicate superiority of teaching; it may simply be a mat- 
ter of the school’s being in a superior neighborhood. Thus 
interpretation of data is essential. 

If only one battery of achievement tests can be given per 
year, October or November is probably the best time. Pupils 
have had an opportunity to settle down and to be reminded 
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ality,” we saw that the defects of these instruments may some- 
times outweigh their possible advantages. However, since per 
sonality and social factors are of major importance in the 
classroom, the cautious use of projective techniques an 
sociometry (see Chapter 9) may offer the teacher some 
help in handling personality problems. 

Ink-blot, cloud, and picture-interpretation tests as means 
of evaluating personality should be used only by those w o 
are specially trained to interpret them. Even in the han s o 
experts, these tests reveal both the strength and weakness o 
projective techniques; i.e., the subject puts himself into 
test and the examiner projects himself into the interpretatio 
of results. Some projective techniques can be of va ue 1 
teacher 'bears in mind this tendency of the person w o a 
ministers and interprets the test to make unique inferen 
Observation of children at play is recommended, not wi 
aim of policing but to see how the child orients im. 
others, what his view of self is, and what his abilities are 
writing of themes, stories, and compositions is rec ° mme " ’ 
recurrent emphases or ideas, when corroborate y ^ 
sources of information, can give teachers clues, not ’ 
Personal and social adjustment. Drawing, p a i n ting, an j 
Painting also may afford some clues as to the c s e ^ 

patterns. The teacher should remember, however, ^ 

'raining is required for adequate interpretation, a t j, e 

raa y gain a deeper understanding of the pupi 1 
cautious study of his creative products. v 

Something of the value of sociometry is revea e y ^ 
common remark of teachers, “I was surprise ukes 

tcrence between what I thought were the inter P e „ 
and dislikes and what was indicated by the sociog op i n g 
n<:w insights can be of great help to teachers i 
-eating and working arrangements for pupi 5 Chapter 

grou P- The teacher must remember that, as we saw 
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key words and phrases, ability to locate information; skill in 
using indexes, tables of contents, references, etc.; and rate of 
reading. Many teachers have experimented with giving one 
form of a silent-reading test at the beginning of a six- to 
twelve-week period of special instruction and the equivalent 
form after the planned exercises have been completed. These 
experiments have been uniformly highly gratifying; students 
have shown gains as high as 50 to 100 per cent, often with 
average gains of 50 per cent in rate and comprehension of 
reading. (It should be noted, however, that unless there is 
some continuing emphasis on the elements of good reading, 
the pupil will tend to regress toward the level of his former 
reading habits.) Individual differences in ability and motiva- 
tion will also influence the variation in achievement. 

The importance of including silent-reading tests in the mini- 
mal testing program is indicated in the following passage: 2 

One of the most important of the modem advances in teaching 
methods is the tendency to force elementary school and high school 
students to read widely in many fields. Instead of confining the 
students’ reading to a few textbooks relating to a limited number 
of topics, the progressive school provides for and demands a wide 
range of reading activity. Furthermore, the solution of most class- 
room problems in the modem school requires the skillful use of 
books as sources of information. In this sense, reading comes to 
mean something mote than merely rapid comprehension of printed 
symbols and the memory and organizations of materials read. It 
becomes also an ability to use books and libraries as efficient 
sources of information. 

Evaluation of Personal and Social Adjustment 
In our discussion of the uses and abuses of tests and in- 
ventories of personality in Chapter 8, “Appraising Person - 

* H. A. Greene, A. N. Jorgensen, and V. H. Kelley, Manual, Iowa 
Silent Reading Tests: Advanced Test, Yonkers, N.Y.: World Book 
Company, 1931, p. 1. 



TOWARD A PLANNED PROGRAM OF EVALUATION 


255 


A Check List for the Testing Program 
A number of considerations are involved in obtaining the 
best results from a testing program. The following check list 
will provide guidance in determining responsibilities and 
duties and anticipating difficulties : 3 


1. Purposes of the program 

Clearly defined 

Understood by parties involved 

2. Choice of tests 

Valid - 

Reliable 

Appropriate difficulty level 

Adequate norms 

Easy to administer and score 

Best available for purpose 

Administration and scoring 

Administrators well trained 

All necessary information provided. 

Scorers adequately instructed 

Scoring carefully checked 

’• Physical conditions 

Sufficient space 

Sufficient time 


Check 


5. 


Conveniently scheduled " 

Utilization of test results 

Definite plans for use of results. 

Provision for giving teachers all necessary 


help in using 


scores 

System of records 


Necessary for purpose 

Sufficient for purpose 

Convenient form for use 

£ 3°,^ er T. Lennon, “Planning a Testing 
no. 55, Division of Test Research 
•• World Book Company, p. 3. 


i* Test Service 

Tnrse’rvice/Yo"^. 
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9, the interpersonal constellations will shift with the passage 
of time and with changes in the situation. Therefore socio- 
metric designs should be redrawn as the occasion demands. 

Subject-matter Tests 

Tests that possess some of the advantages suggested in the 
section dealing with silent-reading tests exist for other areas 
as well. There are arithmetic tests which give some indication 
of specific areas of strength or weakness, i.e., addition, sub- 
traction, division, multiplication, or particular number com- 
binations such as the misapprehension that six times seven is 
forty-four. English-usage tests are available which yield sim- 
ilar diagnostic information. By means of such tests, much 
time can be saved by avoiding repetitious general drill when 
a small amount of drill on a specific detail would suffice. 

There are many tests in the social studies, and their com- 
position varies with the purposes of the test constructors. 
Sometimes the emphasis is upon the mastery of information; 
since facts are the basis of sound thinking, this is a justifiable 
emphasis. Other test makers, however, place primary stress 
upon the use and interpretation of data and upon techniques 
for acquiring information. The individual or committee re- 
sponsible for test selection will have to determine which kind 
of test will best fit the objectives that have been stated for the 
particular school concerned. 

It is probably best to administer subject-matter tests near 
the beginning of the school year, though the time will depend 
upon the specific purposes of the test. In addition to a planned 
succession of tests on a schedule, it should be possible to test 
at irregular times transfer pupils and pupils who were absent 
at the time of regular testing. If data are not available on a 
child when needed, teachers may become discouraged about 
the testing program and abandon the advantages that could 
accrue from a dependable program. 
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avoided— for example, “Tommy sharpened his pencil five 
times today between 2:15 and 3:10, each time P 0 ^ n S 
brushing some other pupil on his way to the sharpener ra 
than “Tommy’s resistance to order and routine is revea e 1 
his chronic tendency to irritate others. Teachers s 
avoid making immediate interpretations of behav ior, sine 
value of the anecdotal record is in tying apparent y 
bits of behavior together into a pattern and affording p P 
five on the child’s growth over a period of time. O con > 
record may also be used to study a particular c 1 
experiencing difficulty in adjustment; then the acco 
describe some particular behavior or situation w ic 
to be characteristic. . , , - uo ji. 

In the evaluation of the social effectiveness . f 

rating scales may prove to be significant. Since 1 is ^ 

to know what others think of an individual in or j e 

the way to personal and social improvement, t e r . 

• will provide clues to approaches. The more e e He 

scales will deal with specific situations rather th & n(jw 

Personality traits. The individuals doing the rating geverity 
one another rather intimately. Since raters di er 1 , enta tive 
or leniency of their judgments, the final resu s ^ 
m nature. Since relationships change wi stn uch as the 

a nce, the results have temporary value only, na t h e se 

sociometric design is closely related to the ra in ^ us i ng 
same precautions and reservations shou e 

S ° C , i0 f amS ' , be a valuable supple- 

informal observations may prove to imagine 

® ent to the evaluation program if fi ie taac t ]iat of a 
himself in the role of a psychologist rat er and by 

Policeman. By doing more listening and less valuab lc 
avoiding the show of shock, the teacher may (eachcrs arc 
, Ues to pupil behavior. Too frequently, t ou » ^ You 

mil of g00(1 advice an d the tendency to chi e. 
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7. Personnel 

Adequately trained for the purpose 

8. Affiliated research ' 

Full advantage taken of results 

Provision for special studies, analyses, etc. 

It is likely that it will be possible to check more of the 
items in this list if planning for the program has been a co- 
operative affair. This group approach should include staff 
members’ sitting in the meetings having to do with test selec- 
tion, defining purposes, constructing the cumulative record, 
and planning the other details. This approach will do more 
than strengthen the testing program. It can be a means of 
welding the faculty into a stronger corps and a means of pro- 
moting individual teacher development. 

Informal Evaluative Techniques 
We have seen that formal and standardized tests constitute 
a substantial part of the program of evaluation but not the . 
entire program. There are also informal techniques of evalua- 
tion, such as anecdotal records, rating scales, and observation, 
which provide valuable, though not statistically accurate, data. 
Creative writing, drawing, and painting are also useful in 
evaluation of pupil behavior. 

Anecdotal records are valuable supplements to the evalua- 
tion program, but like standardized tests, they require the ex- 
ercise of skill in use and caution in interpretation. 4 (1) The 
anecdotal record should be a systematically recorded descrip- 
tion of the pupil’s typical behavior, and (2) it should be used 
periodically so that time for growth is allowed between groups 
of three or four consecutive descriptions. (3) The temptation 
to record teachers’ reactions to pupil behavior should be 

4 Helen Bieker in Fostering Mental Health in Our Schools, Associa- 
tion for Supervision and Curriculum Development, Washington, D.C.: 
National Educational Association, 1950, pp. 184-202. 
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generally satisfied with the results. Two cities diagonally 
across the United States from one another may perhaps e _ 
considered representative — Arlington, Virginia, an 
couver, Washington. 6 Teachers and parents in these two ci les 
answer in the following ways some of the questions t a a 
most frequently asked in connection with this departure r 
the conventional report card: Yes, it takes time, but tea 
find that their additional insights pay dividen 8 m e P 
pupils. Yes, pupils work even harder when the threat o g 
is removed. No, parents are not 1 00 per cent or t i P 
but in Arlington, 92 per cent of them are. Yes, it ta 'e 
tinuous parent-education program. Vancouver teac ers 
that parents must be reeducated each year. No, pupi 
lose in achievement. Pupils in both cities are, on t e ^ ^ 
at or above the national norms for age-grade status. > 
definitely worth trying, for pupils, teachers, an P 
port increased understanding of one another an 
is better rapport. 

SUMMARY 


An effective testing program must fit the spe^ chap _ 
needs. The minimal testing program sugges a gU ide 

ter m ust be considered only as a point of depa 
>0 planning. , variou s 

Actually, minimum programs recommen ^ Qne tes( 
scholars may vary from one test to as many as e - maT y 
ls used, it should be a mental-ability test, u i j n di c ator 

grades, the reading-readiness test is a more acca is e j t her 
tl)e ability required for school. Next in impo (est — 

* e general-achievement test or the diagnostic j es or 

^Pending upon whether the pupils are in t e P 


cle, 


Raymond H. Rignall, “Are Report Cards Necessary 


- Family Cir- 


-/..luiiu M . Kignall, Are 

.p 1 U):t04— 111, September, 1952 . program - 

„ p , a “t F. Qaiser, A Guide to a Functional Pros (mimeo- 

£[ Pros ™ <° Parents, Vancouver, Wash.. 1950. 
t ra Phed). 
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don’t really mean that.” Taking time for observation in the 
classroom, on the playground, in the gym, and in all school 
activities will result in insights that will go much further to- 
ward changing undesirable pupil responses than the show of 
disapproval or the autocratic blocking of wayward conduct. 

Recording and Reporting 

The testing program will lose a substantial part of its value 
unless careful records are kept. Too frequently tests lose much 
of their value because they are used to find status rather than 
to indicate pupil development and progress. It would be de- 
sirable to have a somewhat uniform cumulative record used 
in different schools so that when pupils transferred the data 
that accompanied them would be readily understood. No 
less desirable in the cumulative record is brevity. It should be 
short in order to avoid overwhelming the teacher with facts and 
figures and to prevent the teachers’ spending hours on the 
clerical detail of recording. Spaces should be provided on the 
card or folder for personal data (name, sex, birth date, etc.), 
address, chronology of schools attended, achievement-test 
data, intelligence-test data and special test data (diagnostic, 
aptitude, etc.). In addition the folder may contain a few care- 
fully selected conference notes, reports of observations, and 
anecdotal records. It should be kept in mind that the purpose 

°f.^ e ™ mulatlve record is to facilitate the adjustment of the 
child in his next school or to his next teacher. 

The maximum benefit of a well-planned and well-executed 
program of evaluation cannot be realized if practices of re- 

P n 0rt r S J°^ arentS remain ° n the tradi «onal percentage basis, 
the A, B, C, D, F categorization, or even the C, S, N innova- 
tion. Letters home and home visits have proven helpful. But 
the more promising practice is teacher-pupil-parent confer- 
ences at the elementary level and teacher-pupil conferences 
at the high school level. An increasingly large number of 
schools are using conferences in place of report cards and are 
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4. Would you consider it more important to use equivalent 
forms of intelligence and achievement tests, or to give single tests 
in these areas and add others, such as mechanical- an musica 

aptitude tests? . . . 

5. Draw up a tentative schedule for test administration for me 
entire year, giving the days of the week and the dates, u mi 
it to your colleagues for suggestions and improvement. 

6. How would you suggest that the testing P r ®sm m m a 
teacher eight-grade elementary school be launche • ive , 

7. Get the help of some of your colleagues in rawing 
cumulative record which would be adequate for w at yo 

as a good testing program. 

SUGGESTED ADDITIONAL READINGS 

Cole, Lawrence E., and William F. Bruce: Educational 
°gy, Yonkers, N.Y.: World Book Company, 1950, PP- ^ 

This survey of the origin, development and use of au *ors stress 
a good background for wise selection of tests. results, 

the need for keeping accurate records and for t" erp^_ McGraw - 
Jordon, A. M.: Measurement in Education, Ne 
Hill Book Company, Inc., 1953, pp- 67-94. but the 

This chapter deals mainly with achievemcn 
suggestions are detailed. Illustrative material is i ' . Mc _ 
Kna PP, Robert H.: Practical Guidance Methods, N 
Craw-Hill Book Company, Inc., 1953, pp. • v j ew , su g- 
The author lists, from the pupil-guidance poi r . He 

gested tests in such areas as those mentione in 
Provides illustrations of cumulative records. *j ew York: 

hdursell, James L.: Psychology tor Modern E ^ ,c! * 

• 'V. Norton & Company, Inc., 1952, pp. 3 ' ^ and spe - 

gainst a background of theory relating these two 

c ‘ a ' abilities, the author describes and ova u -j-pe mate- 

ahapters a number of intelligence and abdity ' 
nal provides a good basis for planning a testi g F Bull etin no. 

Ernest W.: Educational Diagnosis, (free). 

•E°s Angeles: California Test Bureau, 19 ’ . aost j c tests in 

. a pamphlet is a description of the use o practical sug- 

tmproving instruction. There are many sound F seems 

Eastions, although the endorsement of pe^on > 

somewhat too hearty. 
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in the primary grades. Of perhaps equal importance with 
achievement tests are silent-reading tests, because reading 
skills so strongly condition the pupils’ attitudes toward self and 
school. The evaluation of personality is often omitted from 
minimal programs. However, inasmuch as sociometric tech- 
niques and some projective techniques are inexpensive and 
informative and personality development is such an integral 
responsibility of the school, the authors recommend this ap- 
proach to personal and social evaluation. The inclusion of 
subject-matter tests, which closely resemble diagnostic tests, 
would place the program which included them on the border- 
line between a strong minimal program and one that ap- 
proached the ideal. 

Planning an effective testing program is not easy, but 
neither is effective teaching a simple process. Just as good 
teaching is made up of many separate steps, so the effective 
use of tests involves attention to many small details. The re- 
wards for taking these painstaking steps are great: teachers 
will get more satisfaction from their work because it is well 
done, and they will be fulfilling the fundamental human need 
for continued personal development. Pupils will be helped to 
develop more symmetrically while they are in school. But 
t e greatest benefit is that another step will be taken toward 
developing the robust pupil who, when his school days are 
over, can steer his own course. 


STUDY AND DISCUSSION EXERCISES 

a Wha ' y ° U disc °™ ” • f™ hours, make 

o'“ to Of the names of tests (and their publishers) which 

Ah cTapter ” """""'"h teting program described in 

Z Would you prefer to give a reading-readiness test or a group 
intelligence test m the first grade? State your reasons ^ 

WOU ' d y °“ USC neXt “ il were P° ss ible to add 
three more to the program outlined in the chapter? 
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each valued at one score point. If Mr. Brown copied the 
scores from the answer sheets without a plan of organization, 
the results might appear like this: 

24, 20, 26, 16, 23, 25, 30, 24, 

19, 21, 24, 28, 20, 23, 32, 26, 

21, 25, 21, 18, 32, 24, 23, 15, 

25, 22, 29, 26, 23, 26. (IV = 30) 

Organized in rank order from high to low, the list of scores 
is more meaningful: 

32, 32, 30, 29, 28, 26, 26, 26, 

26, 25, 25, 25, 24, 24, 24, 24, 

23, 23, 23, 23, 22, 21, 21, 21, 

20, 20, 19, 18, 16, 15. (N =30) 


The Tally Sheet 

Another means of giving meaningful organization to 
of scores is a tally sheet, or frequency table, such as t a p 
soiled in Figure 20. Scores in this table are presente m ■ 
| n som e instances the teacher might wish to group t e sl " 

■1 intervals of two, three, or five to provide a conven , 

mar y table. Steps in preparing the tally sheet are pr 
Wlth Figu re 20. , r 

The tally sheet has the following values for the teac e • ^ 

It presents test results organized in terms 


of the test results m a 


form 


score. 

re presents a summary 
easily scanned for information. . t, an d 

h presents scores in a form which permits 

additional calculations if they are desired. de> 

Properly documented as to date, type 0 ^-mcnt 

and teacher, the tally sheet constitutes a P wjsh 
r( *°rd of the results of the test. The teacher may 
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Organizing Test Results for 
Interpretation 


Developing effective ways to gather information about pupils 
is an essential teacher activity. An equally important activity 
is the task of organizing the data to permit analysis, compari- 
sons, and interpretations. 

The purpose of this appendix is to present some of the 
techniques which may assist the teacher to organize data in 
a meaningful way. Methods of ordering and recording scores 
and developing central reference points and certain relative 
measures are described, and an annotated list of references is 
presented to assist the teacher who is interested in developing 
an understanding of other, more rigorous statistical pro- 
cedures. The content of this appendix has been selected on 
the basis of simplicity and possibility of use by classroom 
teachers rather than on the basis of a criterion of essential 
mathematical precision or adequacy. 

Ranking Scores 

Mr. Brown has just scored a science examination for his 
30 eighth-grade pupils. The test contained thirty-five items, 
262 
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to use this record (a) in comparing two or «ore c 
(6) as an aid to the assignment of grad , ( > _ 

paring individuals with the group, and 1(d) ^ ^ 

ing the status of individuals relative 
typical performance of the group. 

REPRESENTATIVE measures 

The teacher may wish to establish 
best represents the performanc test?” Mr. 

when asked, “How did your group perfo In effec t 

Brown might answer, “The average score f scores 

Mr. Brown has attempted to represent an en.« set 

by means of one quantity. Sue ^ th£ mid . me asure, 

of central tendency. Three sue median are presented 

the arithmetic mean (average), and the mad, an 

The Arithmetic Mean. The ° r ^ way of 

a statistic with which most „ ne quantitative statement, 

encompassing a variety of o{ average height or 

Thus we may describe a pup ayerage {ourth grader, 
weight, of average intelligence, typical or represen- 

Instead of the term average we might use typ 

tative. „ r „„t-itive measure, the arith- 

Aside from its value as a rep r ^ reference. For cx- 

metic mean can be utilize as ye Qr t, e ) 0 w average for 

ample, when we say that u > ^ we are using the aver- 

her age or grade in any s P ecl ' tiona i evaluations, the refer- 
age as a reference point, n ^ absolute quantity, like size 
ence point is seldom i ,, y so mcwhcre in the center 

of score; rather it is a P° m .V arUhmct ic mean is an example 
of a distribution of scor ■ w]l j ch a sc rics of test scores 

of this type of reference point 
can be related. 



Test 

score 


Tally 


Frequency 


f (score) 


35 




34 




33 




32 

u 

2 

64 

31 




30 

/ 

1 

30 

29 

/ 

1 

29 

28 

/ 

1 

28 

27 




26 

//// 

4 

104 

25 

i/i 

3 

75 

24 

III! 

4 

96 

23 

HU 

4 

92 

22 

I 

1 

22 

21 

III 

3 

63 

20 

II 

2 

40 

19 

I 

1 

19 

18 

I 

1 

18 

17 



16 

I 

1 

16 

15 

I 

1 

15 

N 

30 

30 

711 


F . 10 - 20 ' ' Pally sheet ' ° r frequency table, of science-test scores of 30 
eighth-grade dud s. 


Tally: 

j £“,! U "‘' S f ° r ranEe ° £ scores fr ° ra highest to lowest (column 1). 

nnrl jr lT! T™ ^ P “ PU anSWer Shee,s; 'heck tallies with 
number of test papers (column 2). 

3. Sum tallies and record for each score (column 3) 

Arithmetic mean (average): 

1. In column 4 each score has been multiplied by the frequency 
of that score (e.g„ 32 x 2 = 64) y frequency 

2 ' column 4)^ C °' Umn 4 “ "" SUra ° f a11 the scores ° l * 3 4 

3. DMde the sum of all the scores by the number of scores or N 

at the foot of column 2 (e.g., 711 30 = 23 7) 

4. The result of this calculation (23.7) is the 'arithmetic mean or 
average. 
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Test 

scores 

Continuous 

scale 

32 

31.5-32.5 

31 

30.5-31.5 

30 

29.5-30.5 

29 

28.5-29.5 

28 

27.5-28.5 

27 

26.5-27.5 

26 

25.5-26.5 

25 

24.5-25.5 

*24 

[233—243”! 

23 

22.5-23.5 

22 

21.5-22.5 

21 

20.5-21.5 


19.5- 20.5 

18.5- 19.5 

17.5- 18.5 

16.5- 17.5 

15.5- 16.5 

14.5- 15.5 


Frequency 


Cumulative 

frequency 


-* = 30 


<«i/2 ~ 30/2 — 15). 

1. Find half the number of 5C ” r ” frequency equal to or le'' 

2. From column 4, hud Ibe "»^ e „.e score interval tmme- 

than Nt 2 (i.e„ 14). The med.an «... ^ in 

diately above this. interval by the number J 

3. Divide the size of inc «.corc (»- c - 1 • needed 

the interval which contains tw thc num her o (tnl< jpoinl> 

4. Multiply this corrcc.j n ,h,,e«;5(a ^ 

to reach thc midpoint ot » inIer vaI which conta 

levs 14 (cumulated bclovv > ;J _ , 5) . . (o , hc limit 

score) (15-14 — * ^calculations in step - d ,„nlniti.-n. 

5. Add the result of eatcu bf mnliv'.nt 

(23.5, of the interval *' ijnil :)5 + ■=* 

Thc result is thc ntrJu*- 
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The arithmetic mean is calculated by summing all the 
scores and dividing by the number of pupils for whom scores 
have been recorded. Calculations may be developed from a 
tally sheet or frequency table in the manner outlined in Fig- 
ure 20, 

Some values and uses of the arithmetic mean are the fol- 
lowing 1 . 

1. It is a relatively stable and accurate representative meas- 
ure. 

2. It may be used as a point of reference with which the 
performance of individual pupils within the group may 
be compared. 

3. When the same test is used with two or more groups, 
the mean may form a basis for comparison of the 
groups. 

4. The mean forms the basis for the calculation of other 
measures, such as the standard deviation and standard 
scores. 

The Mid-measure. A second type of central reference point 
or expression of central tendency is the mid-measure. The 
mid-measure is the middle score of a series of scores. When 
the number of scores is even, the mid-measure is the average 
of the two scores nearest the middle of the distribution of 
scores. This measure is likely to be useful when the teacher 
needs only a quick and very approximate indication of cen- 
tral tendency. In the case of our distribution of science-test 
scores (Figure 20), the mid-measure (the average of the 
fifteenth and sixteenth scores) is 24. 

The Median. The median is a point on a scale of scores 
which divides the distribution into two equal parts. That is, 
one half of the scores fall above the median and one half be- 
low this point. The median is computed from a frequency 
table or tally sheet such as that presented in Figure 21. For 
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DISTRIBUTION of scores 
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purposes of computation the scores are regarded as a contin- 
uous series. Each unit score — for example, a score of 25 
is considered to represent a range of achievement from 24.5 
to 25.5, much as an inch on a foot rule may be regarded as 
a distance on a continuous linear measure rather than as a 
point. 

The method of computing the median is presented with 
Figure 21. This measure serves many of the same purposes 
as the arithmetic mean; it is a central reference point which 
may facilitate comparisons of individuals and groups. It is 
not ordinarily so stable as the mean, but it does have some 
advantages when the teacher wishes to avoid giving emphasis 
to extreme scores. 

The median is essentially a counting or ranking measure 
which emphasizes relative position rather than actual size of 
score. For example, in the following series the arithmetic 
mean is affected markedly by alteration of one extreme score, 
whereas the median is unaffected by the change. 

Series 

A 90, 40, 38, 32, 30 

B 45, 40, 38, 32, 30 

Mean series A = 220 — 5 = 44 

Mean series B = 185 -4- 5 = 37 

Median series A and B =38 

STUDYING THE DISTRIBUTION OF SCORES 

Although measures of central tendency such as the mean 
and median are useful as reference points for comparisons and 
interpretations, a single point in a distribution fails to tell 
the whole story. An important consideration may be the 
extent to which scores are distributed over the range of the 
test. For example, in Figure 22 the distributions of scores for 
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median. . , 

tion is substituted in tnc 

propriatc common or decimal r between these mcas 

formula. Figure 23 illustrates o{ calculation of 

ures, and Figure 24 illustrates ‘^'’calculation of q«ar- 
pcrccntilcs. General procc uru ^ follows: 
tiles, deciles, and perccntdes . • de , irct l free- 

point. 
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class A and class B indicate that the two groups are not alike 
in performance on the test although the means and medians 
of the two groups are quite similar. The range of scores for 
class A is greater than that for class B. 

The teacher may study the dispersion of scores by means 
of a tally sheet, and he may wish to record the range, which 
is the difference between the highest and lowest scores. In the 
case of class A (Figure 22) the range is 32 — 15, or 17. The 
range for class B is 29 — 19, or 10. 

The range is a relatively unreliable measure, readily influ- 
enced by changes in individual scores at the extremes of the 
distribution. However, this measure provides (a) a simple 
method of describing the dispersion of a set of scores and ( b ) 
additional information beyond that represented by measures 
of central tendency. 

Quar tiles, Deciles, and Percentiles 
A number of measures may be used to indicate various 
points in the distribution. Among such measures are quartiles, 
deciles, and percentiles, which divide the distribution into 
quarters, tenths, and hundredths. The first quartile (Qi) is 
a point which sets off the lowest 25 per cent of the scores. 
The third quartile (Qs) is a point below which fall 75 per 
cent of the scores. Quartile two (Q z ) is identical with the 
median in location and definition. 

Deciles are points below which fall the indicated tenths of 
the scores (e.g., decile 7 marks off the lower seven-tenths of 
the distribution). 

Percentiles mark off the indicated per cent of the distribu- 
tion; for example, percentile 75 (P 75 ) i s the point below 
which fall 75 per cent of the cases. 

The calculation of all these measures is based on the as- 
sumption of a continuous distribution, and all are calculated 
in essentially the same manner as the median when the ap- 
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3. Calculate the fractional “distance” into the next tag jr 
continuous score interval necessary to reac 

4. Multiply the fraction (step 3) by the size of the score 

5. Add this quantity to the lower limit of “petite 
val in which the desired quartile, deci e, P 

point is located. The result is the des.red point m 

distribution. . 

Measures such as quartiles, deciles, and P« ce "^ ^ 
cate how an individual stands m r ^j i °" nf ° rmation is likely 
which the measures were derived. scores) for 

to be more valuable than actual test scores (raw scores) 

purposes of evaluation and for permanen 

Interfile and Semi-interior, He 

i basis for a statistic which can 

The quartile points form th ^ around , he med j an . 

be used to find the dispersio ; n scor es at either 

The range is markedly affecte esti[nate of disper- 

extreme of the distribution. rqn pe or the difference 

sion is provided by the interqua 1 ores w hich includes 

between Q, and Q, This is the cascs . For the score 
approximately the middle P ^ Figure 22j the values 
distributions of class A and ^ been indicated . Th e inter- 
of Q , and Qi f° r ea ch , by the formula 

quartile ranges have been calculated by 

= interquartile range 

F . e 22 that, although the range 
It will be noted from hig jn dispcrsion in the case 

seems to indicate a marke ^ interquartile ranges of 4.8 
of these two groups of score , of „ )c distributions 

and 4.6 indicate that over the ccn‘ 

the spread of scores is qu.tes.m.la. 
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Test 1 

scores 

Continuous 

scale 

Frequency 

Cumulative 

frequency 

32 1 

31.5-32.5 

2 ’ 

30 

31 

30.5-31.5 



30 

29.5-30.5 

1 

28 

29 

28.5-29.5 

1 

27 

28 

27.5-28.5 

1 

26 

27 

26.5-27.5 



26 

1 25.5-26.5 1 

a 

25 

25 

24.5-25.5 

3 


24 ; 

23.5-24.5 

4 

18 

23 

22.5-23.5 

4 

14 

22 

21.5-22.5 

1 

10 

21 

20.5-21.5 

3 

9 

20 

19.5-20.5 

2 

6 

19 

18.5-19.5 

1 

4 

18 

, 17.5-18.5 

1 

3 

17 

16.5-17.5 

1 


16 

15.5-16.5 

1 

2 

15 

14.5-15.5 

1 

1 


N 

30 



Fig. 24. Calculation of percentiles: How to find percentile 75. 


1. Find 75 per cent of 30, or 22.5. Locate in column 4 the cumula- 
tive frequency less than 22.5. This is 21. 

2. Percentile 75 is located in the next interval above cumulative fre- 
quency 21 or in interval 25.5-26.5. 

3. Find the difference between the computed P„ point (22.5) and 
the nearest cumulative total less than this (22.5 — 21 = 1.5). 

4. Since the size of the score interval is one unit and there are four 
scores in the interval in which Pm is located, each score has a value 
of 1 -4- 4, or .25. 

5. Multiply 1.5 X .25 = .375. This is the correction which, added to 
the lower limit of the interval, brings us to percentile 75. 

6. Add the correction (.375) to the value of the lower limit of the 
interval which contains percentile 75. (25.5 + .375 = 25.875, or, 
when rounded, 25.9.) Percentile 75 is 25.9. 
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Ability 


— r 

1 

2 

3 

4 

1 4 




C.F. 

— 

3 

EK 




— 

2 




NT. 

tl 

| O.R. 

.. 




. the relationship 

ig. 25. A scattergram designed to in tea ^ diogra m, quartiles 
foility and achievement for selected P P • ilhin the group for eac 
lave been used to indicate relative s ar j ,j, e pupil’s initials in t e 
est. Results have been indicated yP ranksin the highest 1 uarter ’ 
ippropriate cell as follows: (n) ' ent (column 4). Hence 

ability (row 4) and also in ach ' e ™ m “ in \ersection of row 4 and 

initials appear in the cell represen ‘ uattcr of the group “> * 

column 4. (6) D. R. ranks in the l°'«s ccll representing the n 
of the areas. Hence his initials ®PP“ r ran ks in the third quarter 

tcrsection of row 1 and column * ■ achievement ( c ° u ^ 

in ability (row 3) and the fir* « ° ab i,ity (row 2) but « «•» 
(d) N. T. ranks in the second q^er 
fourth quarter in achievement (e 

„ rth ctudv on the part ot 
Both E. K. and N. T. might be reasons fo r the dis- 

thc teacher to try to identi > P 

erepancy between achievement and 

„ ADDITIONAL readings 
SUGGESTED ^ ^ Snl(lenl , Chicago: 

^of methods of snmmarir- 

Sit.” SS “ »«— \ 

^^J^^SSSSSSSb^ 

!■ r ts **’*■ 
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The semi-interquartile range ( Q ) is frequently used to de- 
scribe the variability of scores around the median. Q is one- 
half the interquartile range. In the case of a set of scores 
which are distributed symmetrically around the median, the 
range of the middle 50 per cent of scores lies between the 
median plus Q and the median minus Q. 

EXAMINING RELATIONSHIPS 

For some purposes the teacher may wish to study the re- 
lationships between two or more sets of scores. For example, 
measures of ability and achievement are frequently compared. 

To study the relationship between two sets of scores or 
characteristics which may be evaluated along a scale, the 
teacher may use a scattergram such as that illustrated in Fig- 
ure 25. A scattergram such as that illustrated may help the 
teacher to locate pupils who ( 1 ) do not appear to be achiev- 
ing at the level which might be expected of them, (2) are 
achieving at a level higher than might be expected, and (3) 
are working at a level of reasonable expectancy, although 
their achievement is low. The scattergram merely organizes 
the data so that relationships such as those above are more 
evident. Like the other techniques presented in this appendix, 
it is a method of organizing data to clarify certain character- 
istics and relationships of score distributions. These tech- 
niques do not tell the teacher how to evaluate these observa- 
tions. For instance, from the scattergram illustrated in Figure 
25, it appears that C. F. and D. R., although at the extremes 
of the class in both ability and achievement, are placed about 
where we might expect to find them. E. K., on the other hand, 
is commonly termed an underachiever, since relative ability is 
considerably in excess of relative achievement. N. T. might be 
termed an overachiever, since, with low-average ability as 
compared to the group, he is achieving a relatively high level. 



Glossary 


Power to 

lent 


.. perform a specified act. Capacity for accomplish- 

as opposed to potential. interpreted 

ire. Units of measurement defined an ^ linear 
ter ®s of a fixed standard or basis, e.g., un 
0ea surement such as inches and feet. w hich 

*" Wra ,es >- A test designed to measure the exte ^ as 

*" individual has acquired certain knowledges o 
Srtsult Of a program of instruction. m-havior or 

'"elusion in a test of sufficient samples o 
t, to constitute a good indication o T] 

Average score obtained by pupils of a gwe ^ 


o constitute a good mdicati ^ -pj, e 
, . -rage score obtained by pupils of a g iv e ^ 

•1 1Q 1 score or value representative of a certai 

frufe norm ) 

" 2 '„^.. Capacit y. of a test item “^flrioufm^ings 


racity of a test item to be mterp mean ings 
way; such items are subject to van 

, undesitable as test items. ■ 

-otcl r - - 


1 one > 
, ^dthus 


— -table as test items. . f tvp ical 

wv r f C0r ‘A A series of brief, written descriptio 

^iy'' 0rs of a P“Pil. nurces or con- 

An evaluation based on data from many s 

multiple facets of personality and ac ie . d j ca tive of 
tn-- ' Putontial or combination of potentia 5 . j ar area. 
(0-. Probat >le capacity to learn in some pa sess j n g 
have an aptitude for music wit ° 

8 (0 perform musically.) . designed 

4?. *. sct of questions or hypothetical situa • A dc - 

^rrv>._^ Qn _v rv-z./llcnn^itions Of C 


‘^n&ine 


: of questions or hypothetical si . A ^ c ' 

one's mental predispositions or 
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Chapter 3 presents an account of statistical measures as an aid 
to the analysis of test results. 

Thorndike, R. L., and E. Hagen: Measurement and Evaluation 
in Psychology and Education, New York: John Wiley & Sons, 
Inc., 1955. 

Chapter 5 introduces statistical concepts related to the study 
and interpretation of test scores and distributions. 

Wrightstone, J. W., J. Justman, and I. Robbins: Evaluation in 
Modern Education, New York: American Book Company, 1956. 
An appendix, pp. 447-457, presents a concise discussion of 
fundamental statistical concepts. 
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cumulative record. A card or to\<te, ^ of significant 

vides blanks or spaces for t P date> {amjly data , 

data about the child’s developmen . and achievement 

address, school record, results o a 1 ds are among 

tests, personality schedules, and anecdotal records 
the data commonly recorded. 

„ . . _ cores which divide the distri- 

deciles. Points in a distribution ot frequency of scores. 

bution into ten equal parts in terms ° convert ed from the raw 
derived score. A score which has ee , tanda rd scores, P er " 
score, e.g., age scores, grade t0 a given raw 

centile scores. Derived scor g 

score. . i„ nroviding a continuum that 

descriptive rating scale. A rating sea . 0 f t he degrees 

presents verbal descriptions »" d 

of possession of the trait being m a way as to determine 
diagnosis. The interpreting of ata 1 d ifijculty are. Also, t e 

what the specific causes of a pu 0 f data . 

verbalized statement of the in erp speci(ic difflcul y, 

diagnostic test. A test that m lca reoding , spelling, or an 
usually in skill subjects such i d . fficu , ty is made by 

metic. The actual diagnosis test results, 

teacher or clinician who interprets ^ ^ ^ figured 

economy. The characteristic of «"•*"£ 

on the basis of how ^ ICC 6. Economy also rebtes 

lion and how many P“P ,ls ’ . jn test constructio . ■ 

, o the saving of the. cachet. «r 

ing, and eVal “ a, ’°" ! ° d ' “ore expressed in “ l ^’ | " rIe , of sub- 
educational age. A dcr a «*,vcn age on # * Oficn 

age score earned by P up *. Specific subject areas. Oficn 

jeet-matter tests °' °" . |Va | cn t. < core so ns 

expressed as a grad ^ of express^ _■ 

^t^itai insight "ZZZJr pradc-piacement 

score may be translated 

or mentat-nge scorch 
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vice for estimating how one will act or believe or what beliefs 
and actions one has readiness for. 
average. See mean . 

capacity. Potentiality for the development of a skill or knowledge. 
(One may have the capacity, or the potentiality for devel- 
oping the ability, to play the piano.) 
character test. A device used to evaluate that aspect of personality 
which relates to ethical, moral, and religious situations and 
concerns the right and wrong of conduct. Measures the inner, 
consistent trends of behavior. 

check list. A device for gathering data by means of a list of pre- 
determined items. The respondent has only to mark the 
items which are pertinent in his response. 
coefficient of correlation (or validity, or reliability). A numerical 
expression of the extent of agreement between two measures 
or measuring instruments. It is expressed in decimal fractions 
ranging from a plus 1.00 (perfect positive agreement) 
through 0,0 (no relationship one way or another) to a 
minus 1.00 (perfect negative relationship — the more of one, 
the less of the other). 

comparability. See equivalent test. The quality of tests that makes 
it possible to use them as substitutes for one another. Having 
the same number of items of the same degree of difficulty 
and covering the same scope or range of material. 

completion test. A test made up of items consisting of a sentence 
or statement from which a word or words have been omitted. 
The student is expected to supply the missing word or words 
in giving his answer. 

constant alternatives. A device used in rating techniques to pro- 
vide a constant set of rating steps (e.g., excellent, good, fair, 
poor) which apply to a number of characteristics being 
rated. 

criterion. A model, point, or standard for comparison which pro- 
vides the basis for judging the merit of a test, behavior, or 
situation. 
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group test A pencil-and-paper test in which several subjects are 
tested simultaneously by one examiner. Most classro 
are of the group variety. 

"halo” effect. The result of influence of E a ^“”dud with 
sions of the subject on his evaluation of the mcUv 
respect to some particular quality or performance. 

individual test. A test, either verbal “ ’ "^stanfordit and 
examiner tests one subject at a ; . . . tests , 

Wechsler-Bellevue are examples 0 1 . tQ expre ss quan- 

inteUisence test. An evaluative d ® vice . E - th res pect to mental 
titatively the relative status of a de signed 
maturity or level of mental ^sess spe cifled 

to estimate general intellectual 

intellectual or mental factors or c ar 0 j attraction to 

interest inventory. A means of m f a * un be called interest or 
certain specified types of -«»>*• May ^ designed to 

, preference tests or personal preference, 

assess vocational, educat.o > term fa me as- 

inventory. A term “^^personality. 

urement areas such as l . , an indication of pres- 

l.Q. Abbreviation for intelligence q by dividing mental 
ent rate of mental growt ing by 10 0. An index of 

age by chronological age an on a test of intelligence 

relative brightness based on ratio between 

or mental ability. Typ.cally derived 

mental and chronolog.cal ages^ members of his 

isolate. A person who is not chosen by any 

group on a sociometnc '«'• on a tcst for the purpose 

item analysis. A study o ea b j ects (out of the total num- 

of comparing the number ° ^ . (cm Md the number 

ber taking the test) who question is then studied to 

who answered it corrects difficulty, and d.s- 

dctect ambiguity, validity, 
criminating power. 
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equivalent test. A test designed to sample exactly the same area 
of behavior as another test, so that the scores do not vary 
significantly when the two tests are given under identical 
circumstances. An equivalent test should have the same 
number of items, sample the same areas, and contain items 
of equal degrees of difficulty. 

essay examination. A series of questions which the pupil is to 
answer by writing compositions. The answer to the question 
is “discussed” in writing by the pupil. 

evaluation. The process of determining the worth of a given indi- 
vidual’s personality, performance, or merit. Usually depends 
upon data from many sources and of many varieties. 

examination. See test. 

fixed pattern. A relatively consistent rating tendency on the part 
of the rater, e.g., consistently rating most pupils high, low, 
or average, thus failing to disperse ratings over the length of 
the scale. 

frequency distribution. A tabulation of scores (tally sheet) ar- 
ranged in serial order (e.g., from high to low) showing the 
number of scores falling at each point in the distribution. 

grade equivalent. See grade norm. 

grade norm. A derived score expressed in years and months of 
location in the elementary and high school. A grade score 
of 8.4 means that the subject’s score is about the same as 
that of the average child who has been, in the eighth grade for 
four months. The grade norm may be based on either the 
median or the average of the distribution of the scores for the 
grade level. 

grades. Numerical or alphabetical scores, assigned to a pupil in a 
given area of his school experience, which designate the 
value of his work and sometimes his conduct. The concept of 
grades and grading is open to criticism because it creates a 
tendency to ignore the nature and extent of individual differ- 
ences and because it employs a single index to measure a 
number of complex growth phenomena. 
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tablished answer key provide a routine procedure for scoring 

of the test items. , . __ 

objectivity. Absence or minimizing of the personal element in an- 
swering or in scoring a test or test item. 
organismic. Denotes the intimate and inseparable: natm re of 
the many facets of growth and behavior in ‘ 
individual. For example, organismic age refers 
age of chronological, mental, emotional, carpal, physical, 

physiological, etc., ages. 

pencil-and-paper tests. Tests which require the subject to -ite 
his responses. Used in contrast to performance tests in which 
the examiner must record the responses or Caviars. ^ 

percentile. A point in a distribution of scores „ ercen tile 

stated percentage of the cases falls. F “ " 

30 is a point in a distribution below which 30 per cent 

cases fall. . , . fnlk For 

percentile rank. The percentile at which a giv ‘ 

example, a percentile rank of 20 
of the subjects attained scores equal to 

fled score. test in which the subject 

performance test. See nonverb • prescribed manner. A 

is requested to do some motor act in . p 

test dependent on a work sample ' . d to re veal those per- 

personality test. A test or mven ^ are consider cd 

sonal characteristics of instruments also may 

to be related to his personality. T“s 
be designated as adjustment, personal,, y. P 

tories. . . ronacitv. Sometimes 

potentiality. An undeveloped ap^u ^ic ^ probably also 

thought to be an '"ttal actors for its nourishment. 

dependent upon environment c vious cxpcricnce 

practice effect. The influence of pracuce 

with a test upon current tesje^ ^ A prognostic 

Pr ° S "^:l i"d S .o°prcdic. behavior or performance in a par- 
ticular area. 
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marks. Letters or numerical symbols representing attempts to re- 
duce the complexities of school achievement to a single in- 
dex. See grades. 

matching test. An examination consisting of two lists of words 
or phrases in columns in which the task of the subject is to 
pair off each item in one of the lists with a related item in 
the other list. 

mean (arithmetic mean). The average obtained by dividing the 
sum of a group of scores by the number of individual scores 
in (he group. 

measurement. The application of a precise, quantitative unit of 
value to any property, quality, or outcome. 

median. The midpoint of a set of scores arranged in order from 
high to low. The point that divides the distribution of scores 
into two equal parts so that half the scores fall above and 
half below the median. 

nonverbal test. A test which does not require the individual to 
write or to read. Examples of nonverbal test items are piling 
blocks in a prescribed pattern, stringing beads, and assem- 
bling puzzles. 

normal curve. A graphic representation of a distribution of scores 
or measures having a distinctively bell-shaped appearance. 
Scores are distributed symmetrically about the mean with 
a concentration of scores around the central point and de- 
creasing frequencies toward the extremes. The normal curve 
has definite mathematical properties. 
norms. Measures, based on test scores, which describe the perform- 
ance of a specified group. Norms may describe average or 
typical performance or indicate the status of the individual 
or group with respect to the performance demonstrated by 
the specified group. 

objective test. An examination which can be scored with a mini- 
mum of influence from the scorer’s opinion or attitude as 
to whether it is right or wrong. Short answers and an es- 
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relative measures. Measures based on comparisons or 

rather than on fixed or constant units. Test norrns for ex 
ample, are relative measures in that then denvahon and 
meaning are based on relationships to a senes of scores 

than on any absolute or fixed value. measuring 

reliability. The extent to which a test is consiste 

what it purports to measure. Usually indented by a coeffi 
cient of reliability or by an indication of the error of measure 
ment. 

sampling. See wide sampling. Subdivisions of 

*~ * 

a trait or behavior are laid o items 

tinuous line. As one progresses along the scale 

vary in nature, degree, or difficulty. nro „ re ssively more 

scaled test. A test in which the items bee 

difficult. .. 0 j personality char- 

schedule (personality). An inventory 

acteristics. -vrivinc a score from 

scoring formula. A method of SyS ^ m t ^ ‘weighting of items or cor- 
test data. Formulas may refe 

rections for guessing. . I pre f e rences of 

sociogram. A graphic representation c n ces. 

a specific group. A mapping P ^ ^ proV ide a basis for 
sociometric test. An instrument es prevailing among 

evaluating the interpersonal relat.onsh.ps P 

the members of a S™“P' fons and rejections among var- 

sociometry. The study of the 

ious members of a group- individual’s ability to acquire 

special aptitude. An indication ot a» 

a specified skill or knowledg • shou id be distinguished 

standard. A mark or goal to or typi cal score rather 

from a norm, which is 

than the desirable score. ^ vor iability or dispersion of 

standard deviation. A me “ s “ distribution. 

scores around the mean o 1 <= ' s a sa mplc of perform- 

standardized test. A tes conditions (of administration 

ances taken under controlled 
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projective technique. A method of studying personality or atti- 
tudes through reactions to pictures, meaningless forms, or 
material to be assembled. It is assumed that the individual 
“projects” his personality, interests, or attitudes in develop- 
ing an interpretation of the materials presented to him. 

Q. The semi-interquartile range — that is, one-half the range of 
the middle 50 per cent of scores in a frequency distribution. 
quartiles. Points in a serially arranged distribution of scores which 
divide the distribution into four equal parts. 
questionnaire. A device designed to provide a rapid means of 
gathering data about an individual. Typically presents a list 
of statements or questions calling for a response. Frequently 
applied to personality and interest inventories. 

rapport. In the area of tests and testing, the development of 
favorable attitudes on the part of the subject toward the 
test materials and testing procedures. 
rating scale. A set of criterion answers or models by means of 
which values can be assigned to the samples being judged, 
i.e., a handwriting scale. See also graphic rating scale and 
descriptive rating scale. 

raw scores. The result obtained from scoring the responses to 
test items. Usually the number of correct answers, but may 
involve required weighting of responses or application of a 
formula to correct for guessing. (See scoring formula.) 
readiness test. A test designed to help determine whether a child 
has the degree of maturation, physical and mental, and the 
fund of background experience which will enable him profit- 
ably to begin the study of the area to which the readiness 
test applies. 

recall item. A test question which requires the student to remem- 
ber the word or answer which is most appropriate. Examples 
of the recall question are completion and essay questions. 
recognition item. A test item that requires the subject to identify 
an answer. Examples are multiple-choice, true-false and 
matching questions. 
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and scoring) and providing a basis for interpretation in the 
form of norms or comparable information. A test for which 
norms have been established. 

standard score. Refers to a score based on the variability of a dis- 
tribution of scores around the mean of the distribution. The 
basic unit of such scales is the standard deviation. 
stencil. As used in this book, not simply a mat for reproducing a 
test but a piece of stiff paper in which holes are punched to 
reveal correct responses on an answer sheet. 
subjectivity. As used in the area of evaluation or measurement, 
this term typically refers to the fact that the judgment of 
the person scoring responses to test items may be a deciding 
factor in evaluation of the responses. 

test. An instrument designed to measure any quality, ability, skill, 
or knowledge. Usually a sampling, comprised of test items, 
of the area it is designed to measure. 
test manual. A pamphlet or booklet that accompanies most stand- 
ardized tests. Explains the purpose of the test, sometimes re- 
lates its historical development, cites statistical data obtained 
during standardization of the test, presents and interprets 
norms, cites limitations, and gives careful directions for ad- 
ministrating and scoring. 

trait. One limited aspect of personality or character, e.g., honesty, 
sincerity, intelligence, determination, initiative, etc. 

true-jalse test. A series of statements which the testee is to indi- 
cate are either correct or incorrect. Sometimes an alternative 
is provided so that the item can be marked “doubtful” or 
“questionable.” 

validity. The characteristic of a test of really sampling what it is 
designed to sample. A valid test of reading really indicates 
skill in reading rather than knowledge of a given area or skill 
in. vocabulary. 

verbal test. A test which requires the use of verbal or language 
skills. Many group intelligence tests consist largely of verbal 
items. 
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function of, 95 
interpretation of, 96 
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as reference points, 45-47 
standard scores as, 57-63, 97 
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meaning of, 206 
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Objectivity, meaning of, 14 
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Test data, organization of, 262 
Test items, analysis of, 91 
Test results, use of, 113 
Test scores, ranking of, 262 
Testing program, check list for. 


Tests, achievement, types of, 90 
adequacy of, 23 
administration of, rules for, 


in 

attitudes toward, 3 
committee on, 29 
economy of, 24, 91 
equivalent, 22 
and instruction, 6 
interpretation of, 112 
manuals, 25 
meaning of, 1 
performance, 7 

publishers of, 34 


Tests, purposes of, 7 
selection of, 28 

precautions in, 40 
standardization of, 48 
types of, 6 

uses and limitations of, 2 
verbal, 7 

(See also specific tests) 
Thinking, factors in, 92 
Traits, personality, 125 
True-false questions, construc- 
tion of, 208 
defects of, 207 


Validity, 15 
Verbal test, 7 

Vocabulary age, 51 
Vocational Interest Blank, 167 
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Sampling, adequacy of, 23 
in test standardization, 47-49 
wide, need for, 47 
Scale of Social Distance, 175 
Scales, 8 

(See also Attitude scales; Rat- 
ing scales) 

Scattergram, 274 
Score, derived, 45 
meaning of, 46 
raw, 46 

Scoring, techniques for, 215 
formulas, 207, 218 
stencils, 217 
Selection of tests, 28 
Self-appraisal, pupil, 231 
Semi-interquartile range, 273 
Skills, learning, development of, 
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testing of, 90 

Social acceptability, factors re- 
lated to, 143 
significance of, 143 
study of, 145 

Social adjustment, approaches to 
study of, 145 
evaluation of, 252 
and school progress, 143 
Social distance scale, 175 
Social relationships and person- 
ality, 121 

Socio-economic status and in- 
telligence, 80 
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analysis of results, 156 
individual representation, 156 
meaning of, 152 
suggestions for drawing, I54 
values of, 155 
Sociomelric method, choice 
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nature of, 145 
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results, scoring of, 149 
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uses of, 155-159 
Standard deviation, 58 

as basis for units of measure- 
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in relation to standard-score 
norms, 60-63 
Standard score, 57-63, 97 
advantages of, 63 
derivation of, 57-59 
and percentile rank, 62 
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uses and limitations, 63 
Standardization, 10, 60 

related to teacher-made tests, 
218 

shortcomings of, 204 
steps in, 48 

Strong Vocational Interest Blank, 
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Subject-matter tests, 254 
Survey testing, 91 

Teacher-made tests, 202 
set-ups for, 214 
values of, 204 

Teacher-pupil conferences, 134, 
233 

Teacher-pupil-parent confer- 
ences, 233 

Test construction, problems in, 

48 

suggestions for, 211 
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